Face forgery video detection based on expression key sequences

IF 5.2 2区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS
Yameng Tu, Jianbin Wu, Liang Lu, Shuaikang Gao, MingHao Li
{"title":"Face forgery video detection based on expression key sequences","authors":"Yameng Tu,&nbsp;Jianbin Wu,&nbsp;Liang Lu,&nbsp;Shuaikang Gao,&nbsp;MingHao Li","doi":"10.1016/j.jksuci.2024.102142","DOIUrl":null,"url":null,"abstract":"<div><p>In order to minimize additional computational costs in detecting forged videos, and enhance detection accuracy, this paper employs dynamic facial expression sequences as key sequences, replacing original video sequences as inputs for the detection model. A spatio-temporal dual-branch detection network is designed based on the visual Transformer architecture. Specifically, this process involves three steps. Firstly, dynamic facial expression sequences are localized as key sequences using optical flow difference algorithms. Subsequently, the spatial branch network employs the focal self-attention mechanism to focus on dynamic features of expression-relevant regions and uses Factorization Machines to facilitate feature interaction among multiple key sequences. Meanwhile, the temporal branch network concentrates on learning the temporal inconsistency of optical flow differences between adjacent frames. Finally, a binary classification linear SVM combines the Softmax values from the two branch networks to provide the ultimate detection outcome. Experimental results on the Faceforensics++ dataset demonstrate: (a) replacing whole video sequences with facial expression key sequences effectively reduces training and detection time by nearly 80% and 90%, respectively; (b) compared to state-of-the-art methods involving random sequence/frame extraction and key frame extraction based on video compression techniques, the proposed approach in this paper presents a more competitive detection accuracy.</p></div>","PeriodicalId":48547,"journal":{"name":"Journal of King Saud University-Computer and Information Sciences","volume":"36 7","pages":"Article 102142"},"PeriodicalIF":5.2000,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1319157824002313/pdfft?md5=d3161c3d47c3e55bf622551f8213c551&pid=1-s2.0-S1319157824002313-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of King Saud University-Computer and Information Sciences","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1319157824002313","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

In order to minimize additional computational costs in detecting forged videos, and enhance detection accuracy, this paper employs dynamic facial expression sequences as key sequences, replacing original video sequences as inputs for the detection model. A spatio-temporal dual-branch detection network is designed based on the visual Transformer architecture. Specifically, this process involves three steps. Firstly, dynamic facial expression sequences are localized as key sequences using optical flow difference algorithms. Subsequently, the spatial branch network employs the focal self-attention mechanism to focus on dynamic features of expression-relevant regions and uses Factorization Machines to facilitate feature interaction among multiple key sequences. Meanwhile, the temporal branch network concentrates on learning the temporal inconsistency of optical flow differences between adjacent frames. Finally, a binary classification linear SVM combines the Softmax values from the two branch networks to provide the ultimate detection outcome. Experimental results on the Faceforensics++ dataset demonstrate: (a) replacing whole video sequences with facial expression key sequences effectively reduces training and detection time by nearly 80% and 90%, respectively; (b) compared to state-of-the-art methods involving random sequence/frame extraction and key frame extraction based on video compression techniques, the proposed approach in this paper presents a more competitive detection accuracy.

基于表情键序列的人脸伪造视频检测
为了尽量减少检测伪造视频的额外计算成本,提高检测精度,本文采用动态面部表情序列作为关键序列,取代原始视频序列作为检测模型的输入。基于视觉变换器架构设计了时空双分支检测网络。具体来说,这一过程包括三个步骤。首先,使用光流差分算法将动态面部表情序列定位为关键序列。随后,空间分支网络利用焦点自我关注机制,关注表情相关区域的动态特征,并利用因式分解机促进多个关键序列之间的特征交互。同时,时间分支网络专注于学习相邻帧之间光流差异的时间不一致性。最后,二元分类线性 SVM 将两个分支网络的 Softmax 值结合起来,提供最终的检测结果。在 Faceforensics++ 数据集上的实验结果表明:(a) 用面部表情关键序列替换整个视频序列能有效地减少近 80% 的训练时间和 90% 的检测时间;(b) 与最先进的随机序列/帧提取方法和基于视频压缩技术的关键帧提取方法相比,本文提出的方法具有更有竞争力的检测精度。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
10.50
自引率
8.70%
发文量
656
审稿时长
29 days
期刊介绍: In 2022 the Journal of King Saud University - Computer and Information Sciences will become an author paid open access journal. Authors who submit their manuscript after October 31st 2021 will be asked to pay an Article Processing Charge (APC) after acceptance of their paper to make their work immediately, permanently, and freely accessible to all. The Journal of King Saud University Computer and Information Sciences is a refereed, international journal that covers all aspects of both foundations of computer and its practical applications.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信