{"title":"Text2Avatar:用文字说明铰接式3D Avatar创建","authors":"Yong-Hoon Kwon;Ju Hong Yoon;Min-Gyu Park","doi":"10.1109/TMM.2025.3535293","DOIUrl":null,"url":null,"abstract":"We propose a framework for creating articulated human avatars, editing their styles, and animating the human avatars from three different types of text instructions. The three types of instructions, identity, edit, and action, are fed into three models that generate, edit, and animate human avatars. Specifically, the proposed framework takes identity instruction and multi-view pose condition images to generate the images of a human using the avatar generation model. Then, the avatar can be edited with text instructions by changing the style of the images generated. We apply the Neural Radiance Field (NeRF) and Poisson reconstruction to extract a human mesh model from images and assign linear blend skinning (LBS) weights to the vertices. Finally, the action instructions can animate human avatars, where we use the off-the-shelf method to generate the motions from text instructions. Notably, our proposed method adapts the appearance of hundreds of different individuals to construct a conditionally editable avatar-generated model, allowing easy creation of 3D avatars using text instructions. We demonstrate high-fidelity 3D animatable avatar creation with text instructions on various datasets and highlight a superior performance of the proposed method compared to the previous studies.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"3797-3806"},"PeriodicalIF":9.7000,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Text2Avatar: Articulated 3D Avatar Creation With Text Instructions\",\"authors\":\"Yong-Hoon Kwon;Ju Hong Yoon;Min-Gyu Park\",\"doi\":\"10.1109/TMM.2025.3535293\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We propose a framework for creating articulated human avatars, editing their styles, and animating the human avatars from three different types of text instructions. The three types of instructions, identity, edit, and action, are fed into three models that generate, edit, and animate human avatars. Specifically, the proposed framework takes identity instruction and multi-view pose condition images to generate the images of a human using the avatar generation model. Then, the avatar can be edited with text instructions by changing the style of the images generated. We apply the Neural Radiance Field (NeRF) and Poisson reconstruction to extract a human mesh model from images and assign linear blend skinning (LBS) weights to the vertices. Finally, the action instructions can animate human avatars, where we use the off-the-shelf method to generate the motions from text instructions. Notably, our proposed method adapts the appearance of hundreds of different individuals to construct a conditionally editable avatar-generated model, allowing easy creation of 3D avatars using text instructions. We demonstrate high-fidelity 3D animatable avatar creation with text instructions on various datasets and highlight a superior performance of the proposed method compared to the previous studies.\",\"PeriodicalId\":13273,\"journal\":{\"name\":\"IEEE Transactions on Multimedia\",\"volume\":\"27 \",\"pages\":\"3797-3806\"},\"PeriodicalIF\":9.7000,\"publicationDate\":\"2025-01-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Multimedia\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10855528/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Multimedia","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10855528/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Text2Avatar: Articulated 3D Avatar Creation With Text Instructions
We propose a framework for creating articulated human avatars, editing their styles, and animating the human avatars from three different types of text instructions. The three types of instructions, identity, edit, and action, are fed into three models that generate, edit, and animate human avatars. Specifically, the proposed framework takes identity instruction and multi-view pose condition images to generate the images of a human using the avatar generation model. Then, the avatar can be edited with text instructions by changing the style of the images generated. We apply the Neural Radiance Field (NeRF) and Poisson reconstruction to extract a human mesh model from images and assign linear blend skinning (LBS) weights to the vertices. Finally, the action instructions can animate human avatars, where we use the off-the-shelf method to generate the motions from text instructions. Notably, our proposed method adapts the appearance of hundreds of different individuals to construct a conditionally editable avatar-generated model, allowing easy creation of 3D avatars using text instructions. We demonstrate high-fidelity 3D animatable avatar creation with text instructions on various datasets and highlight a superior performance of the proposed method compared to the previous studies.
期刊介绍:
The IEEE Transactions on Multimedia delves into diverse aspects of multimedia technology and applications, covering circuits, networking, signal processing, systems, software, and systems integration. The scope aligns with the Fields of Interest of the sponsors, ensuring a comprehensive exploration of research in multimedia.