UniHuman: A Unified Model For Editing Human Images in the Wild
Nannan Li, Qing Liu, Krishna Kumar Singh, Yilin Wang, Jianming Zhang, Bryan A. Plummer, Zhe Lin
Human image editing includes tasks like changing a person's pose their clothing or editing the image according to a text prompt. However prior work often tackles these tasks separately overlooking the benefit of mutual reinforcement from learning them jointly. In this paper we propose UniHuman a unified model that addresses multiple facets of human image editing in real-world settings. To enhance the model's generation quality and generalization capacity we leverage guidance from human visual encoders and introduce a lightweight pose-warping module that can exploit different pose representations accommodating unseen textures and patterns. Furthermore to bridge the disparity between existing human editing benchmarks with real-world data we curated 400K high-quality human image-text pairs for training and collected 2K human images for out-of-domain testing both encompassing diverse clothing styles backgrounds and age groups. Experiments on both in-domain and out-of-domain test sets demonstrate that UniHuman outperforms task-specific models by a significant margin. In user studies UniHuman is preferred by the users in an average of 77% of cases. Our project is available at https://github.com/NannanLi999/UniHuman.