Handling the Details: A Two-Stage Diffusion Approach to Improving Hands in Human Image Generation

IF 5

IEEE transactions on biometrics, behavior, and identity science Pub Date : 2025-06-05 DOI:10.1109/TBIOM.2025.3577085

Anton Pelykh;Ozge Mercanoglu Sincan;Richard Bowden

{"title":"Handling the Details: A Two-Stage Diffusion Approach to Improving Hands in Human Image Generation","authors":"Anton Pelykh;Ozge Mercanoglu Sincan;Richard Bowden","doi":"10.1109/TBIOM.2025.3577085","DOIUrl":null,"url":null,"abstract":"There has been significant progress in human image generation in recent years, particularly with the introduction of diffusion models. However, it is challenging for the existing methods to produce consistent hand anatomy, and the generated images often lack precise control over hand pose. To address this limitation, we introduce a novel two-stage approach to pose-conditioned human image generation. Firstly, we generate detailed hands and then outpaint the body around those hands. We propose training the hand generator in a multi-task setting to produce both hand image and their corresponding segmentation masks, and employ the trained model in the first stage of generation. An adapted ControlNet model is then used in the second stage to outpaint the body. We introduce a novel blending technique that combines the results of both stages in a coherent way and preserves the hand details. It involves sequential expansion of the outpainted region while fusing the latent representations, to ensure a seamless and cohesive synthesis of the final image. Experimental evaluations demonstrate the superiority of our proposed method over state-of-the-art techniques in both pose accuracy and image quality, as validated on the HaGRID and YouTube-ASL datasets. Our approach not only enhances the quality of the generated hands, but also offers improved control over hand pose, advancing the capabilities of pose-conditioned human image generation. We make the code available.","PeriodicalId":73307,"journal":{"name":"IEEE transactions on biometrics, behavior, and identity science","volume":"7 4","pages":"890-901"},"PeriodicalIF":5.0000,"publicationDate":"2025-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on biometrics, behavior, and identity science","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/11025996/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

There has been significant progress in human image generation in recent years, particularly with the introduction of diffusion models. However, it is challenging for the existing methods to produce consistent hand anatomy, and the generated images often lack precise control over hand pose. To address this limitation, we introduce a novel two-stage approach to pose-conditioned human image generation. Firstly, we generate detailed hands and then outpaint the body around those hands. We propose training the hand generator in a multi-task setting to produce both hand image and their corresponding segmentation masks, and employ the trained model in the first stage of generation. An adapted ControlNet model is then used in the second stage to outpaint the body. We introduce a novel blending technique that combines the results of both stages in a coherent way and preserves the hand details. It involves sequential expansion of the outpainted region while fusing the latent representations, to ensure a seamless and cohesive synthesis of the final image. Experimental evaluations demonstrate the superiority of our proposed method over state-of-the-art techniques in both pose accuracy and image quality, as validated on the HaGRID and YouTube-ASL datasets. Our approach not only enhances the quality of the generated hands, but also offers improved control over hand pose, advancing the capabilities of pose-conditioned human image generation. We make the code available.

查看原文本刊更多论文

处理细节：一个两阶段的扩散方法，以改善手在人体图像生成

近年来，随着扩散模型的引入，人体图像生成取得了重大进展。然而，现有的方法很难产生一致的手部解剖结构，并且生成的图像往往缺乏对手部姿势的精确控制。为了解决这一限制，我们引入了一种新的两阶段方法来生成姿势条件的人类图像。首先，我们生成详细的手，然后在这些手周围画出身体。我们提出在多任务设置下训练手生成器来生成手图像及其相应的分割掩模，并在生成的第一阶段使用训练好的模型。然后在第二阶段使用一个适应的ControlNet模型来涂装车身。我们介绍了一种新颖的混合技术，将两个阶段的结果以连贯的方式结合起来，并保留了手部的细节。它包括在融合潜在表征的同时对外涂区域进行顺序扩展，以确保最终图像的无缝和内聚合成。实验评估表明，我们提出的方法在姿势精度和图像质量方面都优于最先进的技术，这在海格和YouTube-ASL数据集上得到了验证。我们的方法不仅提高了生成的手的质量，而且还提供了对手姿势的改进控制，提高了姿势条件下人类图像生成的能力。我们让代码可用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE transactions on biometrics, behavior, and identity science

CiteScore

10.90

自引率

0.00%

发文量