有嘴唇和身份先验的说话面孔一代

IF 0.9 4区 计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING
Jiajie Wu, Frederick W. B. Li, Gary K. L. Tam, Bailin Yang, Fangzhe Nan, Jiahao Pan
{"title":"有嘴唇和身份先验的说话面孔一代","authors":"Jiajie Wu,&nbsp;Frederick W. B. Li,&nbsp;Gary K. L. Tam,&nbsp;Bailin Yang,&nbsp;Fangzhe Nan,&nbsp;Jiahao Pan","doi":"10.1002/cav.70026","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>Speech-driven talking face video generation has attracted growing interest in recent research. While person-specific approaches yield high-fidelity results, they require extensive training data from each individual speaker. In contrast, general-purpose methods often struggle with accurate lip synchronization, identity preservation, and natural facial movements. To address these limitations, we propose a novel architecture that combines an alignment model with a rendering model. The rendering model synthesizes identity-consistent lip movements by leveraging facial landmarks derived from speech, a partially occluded target face, multi-reference lip features, and the input audio. Concurrently, the alignment model estimates optical flow using the occluded face and a static reference image, enabling precise alignment of facial poses and lip shapes. This collaborative design enhances the rendering process, resulting in more realistic and identity-preserving outputs. Extensive experiments demonstrate that our method significantly improves lip synchronization and identity retention, establishing a new benchmark in talking face video generation.</p>\n </div>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"36 3","pages":""},"PeriodicalIF":0.9000,"publicationDate":"2025-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Talking Face Generation With Lip and Identity Priors\",\"authors\":\"Jiajie Wu,&nbsp;Frederick W. B. Li,&nbsp;Gary K. L. Tam,&nbsp;Bailin Yang,&nbsp;Fangzhe Nan,&nbsp;Jiahao Pan\",\"doi\":\"10.1002/cav.70026\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n <p>Speech-driven talking face video generation has attracted growing interest in recent research. While person-specific approaches yield high-fidelity results, they require extensive training data from each individual speaker. In contrast, general-purpose methods often struggle with accurate lip synchronization, identity preservation, and natural facial movements. To address these limitations, we propose a novel architecture that combines an alignment model with a rendering model. The rendering model synthesizes identity-consistent lip movements by leveraging facial landmarks derived from speech, a partially occluded target face, multi-reference lip features, and the input audio. Concurrently, the alignment model estimates optical flow using the occluded face and a static reference image, enabling precise alignment of facial poses and lip shapes. This collaborative design enhances the rendering process, resulting in more realistic and identity-preserving outputs. Extensive experiments demonstrate that our method significantly improves lip synchronization and identity retention, establishing a new benchmark in talking face video generation.</p>\\n </div>\",\"PeriodicalId\":50645,\"journal\":{\"name\":\"Computer Animation and Virtual Worlds\",\"volume\":\"36 3\",\"pages\":\"\"},\"PeriodicalIF\":0.9000,\"publicationDate\":\"2025-05-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer Animation and Virtual Worlds\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/cav.70026\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Animation and Virtual Worlds","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cav.70026","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0

摘要

语音驱动的说话脸视频生成在最近的研究中引起了越来越多的兴趣。虽然针对个人的方法可以产生高保真度的结果,但它们需要来自每个说话者的大量训练数据。相比之下,通用的方法往往与精确的嘴唇同步、身份保存和自然的面部运动作斗争。为了解决这些限制,我们提出了一种结合了对齐模型和呈现模型的新架构。该渲染模型通过利用来自语音的面部标志、部分遮挡的目标面部、多参考嘴唇特征和输入音频来综合身份一致的嘴唇运动。同时,对齐模型使用被遮挡的面部和静态参考图像估计光流,从而实现面部姿势和唇形的精确对齐。这种协作设计增强了渲染过程,从而产生更真实和保留身份的输出。大量的实验表明,我们的方法显著提高了唇部同步和身份保留,为语音人脸视频生成建立了新的基准。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Talking Face Generation With Lip and Identity Priors

Speech-driven talking face video generation has attracted growing interest in recent research. While person-specific approaches yield high-fidelity results, they require extensive training data from each individual speaker. In contrast, general-purpose methods often struggle with accurate lip synchronization, identity preservation, and natural facial movements. To address these limitations, we propose a novel architecture that combines an alignment model with a rendering model. The rendering model synthesizes identity-consistent lip movements by leveraging facial landmarks derived from speech, a partially occluded target face, multi-reference lip features, and the input audio. Concurrently, the alignment model estimates optical flow using the occluded face and a static reference image, enabling precise alignment of facial poses and lip shapes. This collaborative design enhances the rendering process, resulting in more realistic and identity-preserving outputs. Extensive experiments demonstrate that our method significantly improves lip synchronization and identity retention, establishing a new benchmark in talking face video generation.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Computer Animation and Virtual Worlds
Computer Animation and Virtual Worlds 工程技术-计算机:软件工程
CiteScore
2.20
自引率
0.00%
发文量
90
审稿时长
6-12 weeks
期刊介绍: With the advent of very powerful PCs and high-end graphics cards, there has been an incredible development in Virtual Worlds, real-time computer animation and simulation, games. But at the same time, new and cheaper Virtual Reality devices have appeared allowing an interaction with these real-time Virtual Worlds and even with real worlds through Augmented Reality. Three-dimensional characters, especially Virtual Humans are now of an exceptional quality, which allows to use them in the movie industry. But this is only a beginning, as with the development of Artificial Intelligence and Agent technology, these characters will become more and more autonomous and even intelligent. They will inhabit the Virtual Worlds in a Virtual Life together with animals and plants.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信