Navigating large-pose challenge for high-fidelity face reenactment with video diffusion model

IF 2.8 4区计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Computers & Graphics-Uk Pub Date : 2025-09-09 DOI:10.1016/j.cag.2025.104423

Mingtao Guo , Guanyu Xing , Yanci Zhang , Yanli Liu

{"title":"Navigating large-pose challenge for high-fidelity face reenactment with video diffusion model","authors":"Mingtao Guo , Guanyu Xing , Yanci Zhang , Yanli Liu","doi":"10.1016/j.cag.2025.104423","DOIUrl":null,"url":null,"abstract":"<div><div>Face reenactment aims to generate realistic talking head videos by transferring motion from a driving video to a static source image while preserving the source identity. Although existing methods based on either implicit or explicit keypoints have shown promise, they struggle with large pose variations due to warping artifacts or the limitations of coarse facial landmarks. In this paper, we present the Face Reenactment Video Diffusion model (FRVD), a novel framework for high-fidelity face reenactment under large pose changes. Our method first employs a motion extractor to extract implicit facial keypoints from the source and driving images to represent fine-grained motion and to perform motion alignment through a warping module. To address the degradation introduced by warping, we introduce a Warping Feature Mapper (WFM) that maps the warped source image into the motion-aware latent space of a pretrained image-to-video (I2V) model. This latent space encodes rich priors of facial dynamics learned from large-scale video data, enabling effective warping correction and enhancing temporal coherence. Extensive experiments show that FRVD achieves superior performance over existing methods in terms of pose accuracy, identity preservation, and visual quality, especially in challenging scenarios with extreme pose variations.</div></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"132 ","pages":"Article 104423"},"PeriodicalIF":2.8000,"publicationDate":"2025-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Graphics-Uk","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S009784932500264X","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

Face reenactment aims to generate realistic talking head videos by transferring motion from a driving video to a static source image while preserving the source identity. Although existing methods based on either implicit or explicit keypoints have shown promise, they struggle with large pose variations due to warping artifacts or the limitations of coarse facial landmarks. In this paper, we present the Face Reenactment Video Diffusion model (FRVD), a novel framework for high-fidelity face reenactment under large pose changes. Our method first employs a motion extractor to extract implicit facial keypoints from the source and driving images to represent fine-grained motion and to perform motion alignment through a warping module. To address the degradation introduced by warping, we introduce a Warping Feature Mapper (WFM) that maps the warped source image into the motion-aware latent space of a pretrained image-to-video (I2V) model. This latent space encodes rich priors of facial dynamics learned from large-scale video data, enabling effective warping correction and enhancing temporal coherence. Extensive experiments show that FRVD achieves superior performance over existing methods in terms of pose accuracy, identity preservation, and visual quality, especially in challenging scenarios with extreme pose variations.

查看原文本刊更多论文

利用视频扩散模型导航高保真人脸再现的大姿态挑战

人脸再现的目的是通过将运动从驾驶视频转移到静态源图像，同时保持源身份，生成逼真的说话头部视频。尽管现有的基于隐式或显式关键点的方法已经显示出希望，但由于扭曲的伪影或粗糙的面部地标的限制，它们难以应对大的姿势变化。本文提出了人脸再现视频扩散模型（FRVD），这是一种用于大姿态变化下高保真人脸再现的新框架。我们的方法首先使用运动提取器从源图像和驱动图像中提取隐式面部关键点，以表示细粒度运动，并通过翘曲模块执行运动对齐。为了解决由扭曲带来的退化问题，我们引入了一个扭曲特征映射器（WFM），它将扭曲的源图像映射到预训练的图像到视频（I2V）模型的运动感知潜在空间中。这种潜在空间编码了从大规模视频数据中学习到的丰富的面部动态先验，从而实现了有效的扭曲校正和增强时间相干性。大量实验表明，FRVD在姿态精度、身份保持和视觉质量方面优于现有方法，特别是在具有极端姿态变化的挑战性场景中。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computers & Graphics-Uk 工程技术-计算机：软件工程

CiteScore

5.30

自引率

12.00%

发文量

173

审稿时长

38 days

期刊介绍： Computers & Graphics is dedicated to disseminate information on research and applications of computer graphics (CG) techniques. The journal encourages articles on: 1. Research and applications of interactive computer graphics. We are particularly interested in novel interaction techniques and applications of CG to problem domains. 2. State-of-the-art papers on late-breaking, cutting-edge research on CG. 3. Information on innovative uses of graphics principles and technologies. 4. Tutorial papers on both teaching CG principles and innovative uses of CG in education.