{"title":"通过扩散模型的时空组合生成复杂的三维人体运动","authors":"Lorenzo Mandelli, Stefano Berretti","doi":"arxiv-2409.11920","DOIUrl":null,"url":null,"abstract":"In this paper, we address the challenge of generating realistic 3D human\nmotions for action classes that were never seen during the training phase. Our\napproach involves decomposing complex actions into simpler movements,\nspecifically those observed during training, by leveraging the knowledge of\nhuman motion contained in GPTs models. These simpler movements are then\ncombined into a single, realistic animation using the properties of diffusion\nmodels. Our claim is that this decomposition and subsequent recombination of\nsimple movements can synthesize an animation that accurately represents the\ncomplex input action. This method operates during the inference phase and can\nbe integrated with any pre-trained diffusion model, enabling the synthesis of\nmotion classes not present in the training data. We evaluate our method by\ndividing two benchmark human motion datasets into basic and complex actions,\nand then compare its performance against the state-of-the-art.","PeriodicalId":501130,"journal":{"name":"arXiv - CS - Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Generation of Complex 3D Human Motion by Temporal and Spatial Composition of Diffusion Models\",\"authors\":\"Lorenzo Mandelli, Stefano Berretti\",\"doi\":\"arxiv-2409.11920\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we address the challenge of generating realistic 3D human\\nmotions for action classes that were never seen during the training phase. Our\\napproach involves decomposing complex actions into simpler movements,\\nspecifically those observed during training, by leveraging the knowledge of\\nhuman motion contained in GPTs models. These simpler movements are then\\ncombined into a single, realistic animation using the properties of diffusion\\nmodels. Our claim is that this decomposition and subsequent recombination of\\nsimple movements can synthesize an animation that accurately represents the\\ncomplex input action. This method operates during the inference phase and can\\nbe integrated with any pre-trained diffusion model, enabling the synthesis of\\nmotion classes not present in the training data. We evaluate our method by\\ndividing two benchmark human motion datasets into basic and complex actions,\\nand then compare its performance against the state-of-the-art.\",\"PeriodicalId\":501130,\"journal\":{\"name\":\"arXiv - CS - Computer Vision and Pattern Recognition\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Computer Vision and Pattern Recognition\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.11920\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computer Vision and Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11920","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Generation of Complex 3D Human Motion by Temporal and Spatial Composition of Diffusion Models
In this paper, we address the challenge of generating realistic 3D human
motions for action classes that were never seen during the training phase. Our
approach involves decomposing complex actions into simpler movements,
specifically those observed during training, by leveraging the knowledge of
human motion contained in GPTs models. These simpler movements are then
combined into a single, realistic animation using the properties of diffusion
models. Our claim is that this decomposition and subsequent recombination of
simple movements can synthesize an animation that accurately represents the
complex input action. This method operates during the inference phase and can
be integrated with any pre-trained diffusion model, enabling the synthesis of
motion classes not present in the training data. We evaluate our method by
dividing two benchmark human motion datasets into basic and complex actions,
and then compare its performance against the state-of-the-art.