{"title":"SkeletonGAN: Fine-Grained Pose Synthesis of Human-Object Interactions","authors":"Qixuan Sun, Nanxi Chen, Ruipeng Zhang, Jiamao Li, Xiaolin Zhang","doi":"10.1145/3589572.3589579","DOIUrl":null,"url":null,"abstract":"Synthesizing Human-Object Interactions (HOI) is a challenging problem since the human body has a complex and versatile representation. Existing solutions can generate individual objects or faces very well but still face difficulty in generating realistic human bodies and their interaction with multiple objects. In this work, we focus on synthesizing human poses based on HOI descriptive triplets and introduce a novel perspective that decomposes every action between humans and objects into sub-actions of human body parts to generate body poses in a fine-grained way. We propose SkeletonGAN, a conditional generative adversarial model to perform a body-parts-level control over the interaction between humans and objects. SkeletonGAN is trained and evaluated using the HICO-DET dataset, which is a knowledge base consisting of complex interaction poses of various human-object actions in realistic scenarios. We show through qualitative and quantitative evaluations that this model is capable of generating diverse and plausible poses consistent with the given semantic features, and especially our model can also predict the relative position of the object with the body pose. We also explore synthesizing composite poses that include co-occurring human actions, indicating that the model can learn multimodal relationships between human poses and the given conditional semantic features.","PeriodicalId":296325,"journal":{"name":"Proceedings of the 2023 6th International Conference on Machine Vision and Applications","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2023 6th International Conference on Machine Vision and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3589572.3589579","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Synthesizing Human-Object Interactions (HOI) is a challenging problem since the human body has a complex and versatile representation. Existing solutions can generate individual objects or faces very well but still face difficulty in generating realistic human bodies and their interaction with multiple objects. In this work, we focus on synthesizing human poses based on HOI descriptive triplets and introduce a novel perspective that decomposes every action between humans and objects into sub-actions of human body parts to generate body poses in a fine-grained way. We propose SkeletonGAN, a conditional generative adversarial model to perform a body-parts-level control over the interaction between humans and objects. SkeletonGAN is trained and evaluated using the HICO-DET dataset, which is a knowledge base consisting of complex interaction poses of various human-object actions in realistic scenarios. We show through qualitative and quantitative evaluations that this model is capable of generating diverse and plausible poses consistent with the given semantic features, and especially our model can also predict the relative position of the object with the body pose. We also explore synthesizing composite poses that include co-occurring human actions, indicating that the model can learn multimodal relationships between human poses and the given conditional semantic features.