{"title":"A method for synthesizing dynamic image of virtual human","authors":"Siyuan Shen, W. Zhang","doi":"10.1109/ICCECE58074.2023.10135229","DOIUrl":null,"url":null,"abstract":"With the rise of the Metaverse, the need for efficient modeling of avatars becomes increasingly urgent. Building virtual human models from human image datasets has been a hot topic in computer vision. We used the speech synthesis technology to complete the conversion from text to speech waveform, and used the speech-lip shape generation method to generate a real person image with audio and video synchronization, finally used the thin plate spline transformation method to drive the virtual human image, and synthesizes a virtual human with audio and video synchronization image. Experimental results show that this method can effectively solve the problem of text-driven avatar lip mismatch and text-driven avatar audio and video asynchronous problems, and can synthesize high-quality, high-fidelity, low-latency avatars.","PeriodicalId":120030,"journal":{"name":"2023 3rd International Conference on Consumer Electronics and Computer Engineering (ICCECE)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 3rd International Conference on Consumer Electronics and Computer Engineering (ICCECE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCECE58074.2023.10135229","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
With the rise of the Metaverse, the need for efficient modeling of avatars becomes increasingly urgent. Building virtual human models from human image datasets has been a hot topic in computer vision. We used the speech synthesis technology to complete the conversion from text to speech waveform, and used the speech-lip shape generation method to generate a real person image with audio and video synchronization, finally used the thin plate spline transformation method to drive the virtual human image, and synthesizes a virtual human with audio and video synchronization image. Experimental results show that this method can effectively solve the problem of text-driven avatar lip mismatch and text-driven avatar audio and video asynchronous problems, and can synthesize high-quality, high-fidelity, low-latency avatars.