{"title":"虚拟人说话头生成","authors":"Wenchao Song, Qiang He, Guowei Chen","doi":"10.1145/3590003.3590004","DOIUrl":null,"url":null,"abstract":"Abstract: Virtual humans created by computers using deep learning technology are being used widely in a variety of fields, including personal assistance, intelligent customer service, and online education. Human-computer interaction systems integrate multi-modal technologies like speech recognition, dialogue systems, speech synthesis, and virtual digital human video synthesis as one of the applications of virtual humans. In this paper, we first design the framework for a human-computer interaction system based on a virtual human; next, we classify the talking head video synthesis model according to the generation of a virtual human's depth; finally, we conduct a systematic review of the technical developments in talking head video generation over the last five years, highlighting seminal work.","PeriodicalId":340225,"journal":{"name":"Proceedings of the 2023 2nd Asia Conference on Algorithms, Computing and Machine Learning","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Virtual Human Talking-Head Generation\",\"authors\":\"Wenchao Song, Qiang He, Guowei Chen\",\"doi\":\"10.1145/3590003.3590004\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract: Virtual humans created by computers using deep learning technology are being used widely in a variety of fields, including personal assistance, intelligent customer service, and online education. Human-computer interaction systems integrate multi-modal technologies like speech recognition, dialogue systems, speech synthesis, and virtual digital human video synthesis as one of the applications of virtual humans. In this paper, we first design the framework for a human-computer interaction system based on a virtual human; next, we classify the talking head video synthesis model according to the generation of a virtual human's depth; finally, we conduct a systematic review of the technical developments in talking head video generation over the last five years, highlighting seminal work.\",\"PeriodicalId\":340225,\"journal\":{\"name\":\"Proceedings of the 2023 2nd Asia Conference on Algorithms, Computing and Machine Learning\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-03-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2023 2nd Asia Conference on Algorithms, Computing and Machine Learning\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3590003.3590004\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2023 2nd Asia Conference on Algorithms, Computing and Machine Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3590003.3590004","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Abstract: Virtual humans created by computers using deep learning technology are being used widely in a variety of fields, including personal assistance, intelligent customer service, and online education. Human-computer interaction systems integrate multi-modal technologies like speech recognition, dialogue systems, speech synthesis, and virtual digital human video synthesis as one of the applications of virtual humans. In this paper, we first design the framework for a human-computer interaction system based on a virtual human; next, we classify the talking head video synthesis model according to the generation of a virtual human's depth; finally, we conduct a systematic review of the technical developments in talking head video generation over the last five years, highlighting seminal work.