人脸分析合成逼真的说话头

Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580) Pub Date : 2000-03-26 DOI:10.1109/AFGR.2000.840633

H. Graf, E. Cosatto, Tony Ezzat

{"title":"人脸分析合成逼真的说话头","authors":"H. Graf, E. Cosatto, Tony Ezzat","doi":"10.1109/AFGR.2000.840633","DOIUrl":null,"url":null,"abstract":"This paper describes techniques for extracting bitmaps of facial parts from videos of a talking person. The goal is to synthesize photo-realistic talking heads of high quality that show picture-perfect appearance and realistic head movements with good lip-sound synchronization. For the synthesis of a talking head, bitmaps of facial parts are combined to form whole heads and then sequences of such images are integrated with audio from a text-to-speech synthesizer. For a seamless integration of facial parts into an animation, their shape and visual appearance must be known with high accuracy. The recognition system has to find not only the locations of facial features, but must also be able to determine the head's orientation and recognize the facial expressions. Our face recognition proceeds in multiple steps, each with an increased precision. Using motion, color and shape information, the head's position and the location of the main facial features are determined first. Then smaller areas are searched with matched filters, in order to identify specific facial features with high precision. From this information a head's 3D orientation is calculated. Facial parts are cut from the image and, using the head's orientation, are warped into bitmaps with 'normalized' orientation and scale.","PeriodicalId":360065,"journal":{"name":"Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2000-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"47","resultStr":"{\"title\":\"Face analysis for the synthesis of photo-realistic talking heads\",\"authors\":\"H. Graf, E. Cosatto, Tony Ezzat\",\"doi\":\"10.1109/AFGR.2000.840633\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper describes techniques for extracting bitmaps of facial parts from videos of a talking person. The goal is to synthesize photo-realistic talking heads of high quality that show picture-perfect appearance and realistic head movements with good lip-sound synchronization. For the synthesis of a talking head, bitmaps of facial parts are combined to form whole heads and then sequences of such images are integrated with audio from a text-to-speech synthesizer. For a seamless integration of facial parts into an animation, their shape and visual appearance must be known with high accuracy. The recognition system has to find not only the locations of facial features, but must also be able to determine the head's orientation and recognize the facial expressions. Our face recognition proceeds in multiple steps, each with an increased precision. Using motion, color and shape information, the head's position and the location of the main facial features are determined first. Then smaller areas are searched with matched filters, in order to identify specific facial features with high precision. From this information a head's 3D orientation is calculated. Facial parts are cut from the image and, using the head's orientation, are warped into bitmaps with 'normalized' orientation and scale.\",\"PeriodicalId\":360065,\"journal\":{\"name\":\"Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580)\",\"volume\":\"14 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2000-03-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"47\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/AFGR.2000.840633\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AFGR.2000.840633","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 47

摘要

本文描述了从说话人的视频中提取面部部位位图的技术。目标是合成高质量的逼真的说话头，显示出完美的画面外观和逼真的头部运动，并具有良好的唇音同步。为了合成一个会说话的头，将面部各部分的位图组合成一个完整的头，然后将这些图像的序列与来自文本语音合成器的音频集成在一起。为了在动画中无缝地整合面部部分，必须高精度地知道它们的形状和视觉外观。识别系统不仅要找到面部特征的位置，还必须能够确定头部的方向并识别面部表情。我们的人脸识别分多个步骤进行，每一步都提高了精度。利用运动、颜色和形状信息，首先确定头部的位置和主要面部特征的位置。然后用匹配的滤波器搜索较小的区域，以高精度地识别特定的面部特征。根据这些信息计算出头部的三维方向。面部部分从图像中剪切出来，使用头部的方向，将其扭曲成具有“标准化”方向和比例的位图。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Face analysis for the synthesis of photo-realistic talking heads

This paper describes techniques for extracting bitmaps of facial parts from videos of a talking person. The goal is to synthesize photo-realistic talking heads of high quality that show picture-perfect appearance and realistic head movements with good lip-sound synchronization. For the synthesis of a talking head, bitmaps of facial parts are combined to form whole heads and then sequences of such images are integrated with audio from a text-to-speech synthesizer. For a seamless integration of facial parts into an animation, their shape and visual appearance must be known with high accuracy. The recognition system has to find not only the locations of facial features, but must also be able to determine the head's orientation and recognize the facial expressions. Our face recognition proceeds in multiple steps, each with an increased precision. Using motion, color and shape information, the head's position and the location of the main facial features are determined first. Then smaller areas are searched with matched filters, in order to identify specific facial features with high precision. From this information a head's 3D orientation is calculated. Facial parts are cut from the image and, using the head's orientation, are warped into bitmaps with 'normalized' orientation and scale.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580)

自引率

0.00%

发文量