{"title":"一种虚拟人动态图像的合成方法","authors":"Siyuan Shen, W. Zhang","doi":"10.1109/ICCECE58074.2023.10135229","DOIUrl":null,"url":null,"abstract":"With the rise of the Metaverse, the need for efficient modeling of avatars becomes increasingly urgent. Building virtual human models from human image datasets has been a hot topic in computer vision. We used the speech synthesis technology to complete the conversion from text to speech waveform, and used the speech-lip shape generation method to generate a real person image with audio and video synchronization, finally used the thin plate spline transformation method to drive the virtual human image, and synthesizes a virtual human with audio and video synchronization image. Experimental results show that this method can effectively solve the problem of text-driven avatar lip mismatch and text-driven avatar audio and video asynchronous problems, and can synthesize high-quality, high-fidelity, low-latency avatars.","PeriodicalId":120030,"journal":{"name":"2023 3rd International Conference on Consumer Electronics and Computer Engineering (ICCECE)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A method for synthesizing dynamic image of virtual human\",\"authors\":\"Siyuan Shen, W. Zhang\",\"doi\":\"10.1109/ICCECE58074.2023.10135229\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the rise of the Metaverse, the need for efficient modeling of avatars becomes increasingly urgent. Building virtual human models from human image datasets has been a hot topic in computer vision. We used the speech synthesis technology to complete the conversion from text to speech waveform, and used the speech-lip shape generation method to generate a real person image with audio and video synchronization, finally used the thin plate spline transformation method to drive the virtual human image, and synthesizes a virtual human with audio and video synchronization image. Experimental results show that this method can effectively solve the problem of text-driven avatar lip mismatch and text-driven avatar audio and video asynchronous problems, and can synthesize high-quality, high-fidelity, low-latency avatars.\",\"PeriodicalId\":120030,\"journal\":{\"name\":\"2023 3rd International Conference on Consumer Electronics and Computer Engineering (ICCECE)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-01-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 3rd International Conference on Consumer Electronics and Computer Engineering (ICCECE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCECE58074.2023.10135229\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 3rd International Conference on Consumer Electronics and Computer Engineering (ICCECE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCECE58074.2023.10135229","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A method for synthesizing dynamic image of virtual human
With the rise of the Metaverse, the need for efficient modeling of avatars becomes increasingly urgent. Building virtual human models from human image datasets has been a hot topic in computer vision. We used the speech synthesis technology to complete the conversion from text to speech waveform, and used the speech-lip shape generation method to generate a real person image with audio and video synchronization, finally used the thin plate spline transformation method to drive the virtual human image, and synthesizes a virtual human with audio and video synchronization image. Experimental results show that this method can effectively solve the problem of text-driven avatar lip mismatch and text-driven avatar audio and video asynchronous problems, and can synthesize high-quality, high-fidelity, low-latency avatars.