AnyBald: Toward Realistic Diffusion-Based Hair Removal In-the-Wild

Yongjun Choi^1*‡ Seungoh Han^1*‡ Soomin Kim^2‡ Sumin Son^2‡ Mohsen Rohani³ Edgar Maucourant³ Dongbo Min² Kyungdon Joo^1✝

¹3D Vision & Robotics Lab, UNIST ²Ewha Womans University ³Modiface

^*Equal contribution ^✝Corresponding Author
^‡Work done while visiting the Department of Mechanical and Industrial Engineering (MIE), University of Toronto.

WACV 2026

Paper (Coming Soon) Dataset Code (Coming Soon)

🌟 TL;DR: AnyBald is a mask-free diffusion-based framework for realistic hair removal in the wild, achieving natural bald manipulation while preserving facial identity across diverse real-world scenarios.

Abstract

We present AnyBald, a novel framework for realistic hair removal from portrait images captured under diverse in-the-wild conditions. One of the key challenges in this task is the lack of high-quality paired data, as existing datasets are often low-quality, with limited viewpoint variation and overall diversity, making it difficult to handle real-world cases. To address this, we construct a scalable data augmentation pipeline that synthesizes high-quality hair and non-hair image pairs capturing diverse real-world scenarios, enabling effective generalization with the added benefit of scalable supervision. With this enriched dataset, we present a new hair removal framework that reformulates pretrained latent diffusion inpainting using learnable text prompts, removing the need for explicit masking at inference. In doing so, our model achieves natural hair removal with semantic preservation via implicit localization. To further improve spatial precision, we introduce a regularization loss that guides the model to focus attention specifically on hair regions. Extensive experiments demonstrate that AnyBald outperforms in removing hairstyles while preserving identity and background semantics across various in-the-wild domains.

Method

To achieve robust hair removal in the wild, AnyBald integrates three core components:

1. Paired Bald Augmentation Pipeline: We synthesize high-quality paired training data covering diverse poses and backgrounds to overcome the lack of real-world paired datasets.
2. Dual-Branch based Mask-free Diffusion Model: Our architecture employs a dual-branch design with learnable text prompts, enabling the model to selectively remove hair while preserving facial identity and semantic details.
3. Text Localization Loss: To enhance spatial precision without explicit masks, we introduce a regularization loss that guides the learnable prompts to attend specifically to hair regions.

Results

Qualitative Results on CelebA In-the-Wild

We compare AnyBald with state-of-the-art methods on CelebA in-the-wild dataset. Our approach demonstrates superior performance in completely removing hair while maintaining realistic skin textures and head shapes across diverse facial expressions and poses.

Qualitative Results on DeepFashion2

Results on DeepFashion2 dataset show that AnyBald effectively handles complex real-world scenarios with various backgrounds, lighting conditions, and full-body compositions, while preserving facial identity and background consistency.

Applications

3D Face Reconstruction

The bald images generated by our method facilitate more accurate 3D face reconstruction by revealing the full head structure.

3D reconstruction performed using DECA.

Hair Transfer

Our high-quality bald results serve as an excellent foundation for virtual hair try-on and transfer applications.

Hair transfer performed using Stable Hair's stage 2 module.

Citation

TBD (To be published at WACV 2026)

Acknowledgements

This work was supported by Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government(MSIT) (RS-2022-00143911, AI Excellence Global Innovative Leader Education Program).