We present AnyBald, a novel framework for realistic hair removal from portrait images captured under diverse in-the-wild conditions. One of the key challenges in this task is the lack of high-quality paired data, as existing datasets are often low-quality, with limited viewpoint variation and overall diversity, making it difficult to handle real-world cases. To address this, we construct a scalable data augmentation pipeline that synthesizes high-quality hair and non-hair image pairs capturing diverse real-world scenarios, enabling effective generalization with the added benefit of scalable supervision. With this enriched dataset, we present a new hair removal framework that reformulates pretrained latent diffusion inpainting using learnable text prompts, removing the need for explicit masking at inference. In doing so, our model achieves natural hair removal with semantic preservation via implicit localization. To further improve spatial precision, we introduce a regularization loss that guides the model to focus attention specifically on hair regions. Extensive experiments demonstrate that AnyBald outperforms in removing hairstyles while preserving identity and background semantics across various in-the-wild domains.
To achieve robust hair removal in the wild, AnyBald integrates three core components:
1. Paired Bald Augmentation Pipeline: We synthesize high-quality paired training data covering diverse poses and backgrounds to overcome the lack of real-world paired datasets.
2. Dual-Branch based Mask-free Diffusion Model: Our architecture employs a dual-branch design with learnable text prompts, enabling the model to selectively remove hair while preserving facial identity and semantic details.
3. Text Localization Loss: To enhance spatial precision without explicit masks, we introduce a regularization loss that guides the learnable prompts to attend specifically to hair regions.
We compare AnyBald with state-of-the-art methods on CelebA in-the-wild dataset. Our approach demonstrates superior performance in completely removing hair while maintaining realistic skin textures and head shapes across diverse facial expressions and poses.
Results on DeepFashion2 dataset show that AnyBald effectively handles complex real-world scenarios with various backgrounds, lighting conditions, and full-body compositions, while preserving facial identity and background consistency.
The bald images generated by our method facilitate more accurate 3D face reconstruction by revealing the full head structure.
3D reconstruction performed using DECA.
Our high-quality bald results serve as an excellent foundation for virtual hair try-on and transfer applications.
Hair transfer performed using Stable Hair's stage 2 module.
TBD (To be published at WACV 2026)
This work was supported by Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government(MSIT) (RS-2022-00143911, AI Excellence Global Innovative Leader Education Program).