从一张照片中创建一个语音头像

2008 IEEE Virtual Reality Conference Pub Date : 2008-03-08 DOI:10.1109/VR.2008.4480758

D. Bitouk, S. Nayar

{"title":"从一张照片中创建一个语音头像","authors":"D. Bitouk, S. Nayar","doi":"10.1109/VR.2008.4480758","DOIUrl":null,"url":null,"abstract":"This paper presents a complete framework for creating a speech-enabled avatar from a single image of a person. Our approach uses a generic facial motion model which represents deformations of a prototype face during speech. We have developed an HMM-based facial animation algorithm which takes into account both lexical stress and coarticulation. This algorithm produces realistic animations of the prototype facial surface from either text or speech. The generic facial motion model can be transformed to a novel face geometry using a set of corresponding points between the prototype face surface and the novel face. Given a face photograph, a small number of manually selected features in the photograph are used to deform the prototype face surface. The deformed surface is then used to animate the face in the photograph. We show several examples of avatars that are driven by text and speech inputs.","PeriodicalId":173744,"journal":{"name":"2008 IEEE Virtual Reality Conference","volume":"54 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":"{\"title\":\"Creating a Speech Enabled Avatar from a Single Photograph\",\"authors\":\"D. Bitouk, S. Nayar\",\"doi\":\"10.1109/VR.2008.4480758\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents a complete framework for creating a speech-enabled avatar from a single image of a person. Our approach uses a generic facial motion model which represents deformations of a prototype face during speech. We have developed an HMM-based facial animation algorithm which takes into account both lexical stress and coarticulation. This algorithm produces realistic animations of the prototype facial surface from either text or speech. The generic facial motion model can be transformed to a novel face geometry using a set of corresponding points between the prototype face surface and the novel face. Given a face photograph, a small number of manually selected features in the photograph are used to deform the prototype face surface. The deformed surface is then used to animate the face in the photograph. We show several examples of avatars that are driven by text and speech inputs.\",\"PeriodicalId\":173744,\"journal\":{\"name\":\"2008 IEEE Virtual Reality Conference\",\"volume\":\"54 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2008-03-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"10\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2008 IEEE Virtual Reality Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/VR.2008.4480758\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 IEEE Virtual Reality Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/VR.2008.4480758","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 10

摘要

本文提出了一个完整的框架，用于从一个人的单个图像创建一个具有语音功能的化身。我们的方法使用了一个通用的面部运动模型，该模型代表了说话过程中原型面部的变形。我们开发了一种基于hmm的面部动画算法，该算法同时考虑了词法重音和协同发音。该算法从文本或语音中生成原型面部的逼真动画。利用原型人脸表面与新人脸之间的一组对应点，可以将通用的人脸运动模型转换为新的人脸几何形状。给定一张人脸照片，在照片中手动选择少量特征来变形原型人脸表面。然后用变形的表面使照片中的脸动起来。我们展示了几个由文本和语音输入驱动的化身的例子。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Creating a Speech Enabled Avatar from a Single Photograph

This paper presents a complete framework for creating a speech-enabled avatar from a single image of a person. Our approach uses a generic facial motion model which represents deformations of a prototype face during speech. We have developed an HMM-based facial animation algorithm which takes into account both lexical stress and coarticulation. This algorithm produces realistic animations of the prototype facial surface from either text or speech. The generic facial motion model can be transformed to a novel face geometry using a set of corresponding points between the prototype face surface and the novel face. Given a face photograph, a small number of manually selected features in the photograph are used to deform the prototype face surface. The deformed surface is then used to animate the face in the photograph. We show several examples of avatars that are driven by text and speech inputs.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2008 IEEE Virtual Reality Conference

自引率

0.00%

发文量