通过扩散模型估算像素级 DensePose 的高逼真度合成数据集

IF 7.5 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Pub Date : 2024-11-06 DOI:10.1016/j.patcog.2024.111137

Jiaxiao Wen, Tao Chu, Qiong Liu

{"title":"通过扩散模型估算像素级 DensePose 的高逼真度合成数据集","authors":"Jiaxiao Wen, Tao Chu, Qiong Liu","doi":"10.1016/j.patcog.2024.111137","DOIUrl":null,"url":null,"abstract":"<div><div>Generating training data with pixel-level annotations for DensePose is a labor-intensive task, resulting in sparse labeling in real-world datasets. Prior solutions have relied on specialized data generation systems to synthesize datasets. However, these synthetic datasets often lack realism and rely on expensive resources such as human body models and texture mappings. In this paper, we address these challenges by introducing a novel data generation method based on the diffusion model, effectively producing highly realistic data without the need for expensive resources. Specifically, our method comprises annotation generation and image generation. Utilizing graphic renderers and SMPL models, we produce synthetic annotations solely based on human poses and shapes. Subsequently, guided by these annotations, we employ simple yet effective textual prompts to generate a wide range of realistic images using the diffusion model. Our experiments conducted on DensePose-COCO dataset demonstrate the superiority of our method compared to existing methods. Code and benchmarks will be released.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111137"},"PeriodicalIF":7.5000,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Highly realistic synthetic dataset for pixel-level DensePose estimation via diffusion model\",\"authors\":\"Jiaxiao Wen, Tao Chu, Qiong Liu\",\"doi\":\"10.1016/j.patcog.2024.111137\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Generating training data with pixel-level annotations for DensePose is a labor-intensive task, resulting in sparse labeling in real-world datasets. Prior solutions have relied on specialized data generation systems to synthesize datasets. However, these synthetic datasets often lack realism and rely on expensive resources such as human body models and texture mappings. In this paper, we address these challenges by introducing a novel data generation method based on the diffusion model, effectively producing highly realistic data without the need for expensive resources. Specifically, our method comprises annotation generation and image generation. Utilizing graphic renderers and SMPL models, we produce synthetic annotations solely based on human poses and shapes. Subsequently, guided by these annotations, we employ simple yet effective textual prompts to generate a wide range of realistic images using the diffusion model. Our experiments conducted on DensePose-COCO dataset demonstrate the superiority of our method compared to existing methods. Code and benchmarks will be released.</div></div>\",\"PeriodicalId\":49713,\"journal\":{\"name\":\"Pattern Recognition\",\"volume\":\"159 \",\"pages\":\"Article 111137\"},\"PeriodicalIF\":7.5000,\"publicationDate\":\"2024-11-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Pattern Recognition\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0031320324008884\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0031320324008884","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

为 DensePose 生成带有像素级注释的训练数据是一项劳动密集型任务，导致真实世界数据集中的标签稀疏。先前的解决方案依赖于专门的数据生成系统来合成数据集。然而，这些合成数据集往往缺乏真实感，而且依赖于昂贵的资源，如人体模型和纹理映射。在本文中，我们引入了一种基于扩散模型的新型数据生成方法，无需昂贵的资源即可有效生成高度逼真的数据，从而解决了这些难题。具体来说，我们的方法包括注释生成和图像生成。利用图形渲染器和 SMPL 模型，我们仅根据人的姿势和形状生成合成注释。随后，在这些注释的指导下，我们采用简单而有效的文字提示，利用扩散模型生成各种逼真的图像。我们在 DensePose-COCO 数据集上进行的实验证明，与现有方法相比，我们的方法更胜一筹。我们将发布代码和基准。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Highly realistic synthetic dataset for pixel-level DensePose estimation via diffusion model

Generating training data with pixel-level annotations for DensePose is a labor-intensive task, resulting in sparse labeling in real-world datasets. Prior solutions have relied on specialized data generation systems to synthesize datasets. However, these synthetic datasets often lack realism and rely on expensive resources such as human body models and texture mappings. In this paper, we address these challenges by introducing a novel data generation method based on the diffusion model, effectively producing highly realistic data without the need for expensive resources. Specifically, our method comprises annotation generation and image generation. Utilizing graphic renderers and SMPL models, we produce synthetic annotations solely based on human poses and shapes. Subsequently, guided by these annotations, we employ simple yet effective textual prompts to generate a wide range of realistic images using the diffusion model. Our experiments conducted on DensePose-COCO dataset demonstrate the superiority of our method compared to existing methods. Code and benchmarks will be released.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Pattern Recognition 工程技术-工程：电子与电气

CiteScore

14.40

自引率

16.20%

发文量

683

审稿时长

5.6 months

期刊介绍： The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.