Constructing Spatio-Temporal Graphs for Face Forgery Detection

IF 2.6 4区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on the Web Pub Date : 2023-05-22 DOI:https://dl.acm.org/doi/10.1145/3580512

Zhihua Shang, Hongtao Xie, Lingyun Yu, Zhengjun Zha, Yongdong Zhang

{"title":"Constructing Spatio-Temporal Graphs for Face Forgery Detection","authors":"Zhihua Shang, Hongtao Xie, Lingyun Yu, Zhengjun Zha, Yongdong Zhang","doi":"https://dl.acm.org/doi/10.1145/3580512","DOIUrl":null,"url":null,"abstract":"<p>Recently, advanced development of facial manipulation techniques threatens web information security, thus, face forgery detection attracts a lot of attention. It is clear that both spatial and temporal information of facial videos contains the crucial manipulation traces, which are inevitably created during the generation process. However, most existing face forgery detectors only focus on the spatial artifacts or the temporal incoherence, and they are struggling to learn a significant and general kind of representations for manipulated facial videos. In this work, we propose to construct spatial-temporal graphs for fake videos to capture the spatial inconsistency and the temporal incoherence at the same time. To model the spatial-temporal relationship among the graph nodes, a novel forgery detector named Spatio-Temporal Graph Network (STGN) is proposed, which contains two kinds of graph-convolution-based units, the Spatial Relation Graph Unit (SRGU) and the Temporal Attention Graph Unit (TAGU). To exploit spatial information, the SRGU models the inconsistency between each pair of patches in the same frame, instead of focusing on the low-level local spatial artifacts which are vulnerable to samples created by unseen manipulation methods. And, the TAGU is proposed to model the long-distance temporal relation among the patches at the same spatial position in different frames with a graph attention mechanism based on the inter-node similarity. With the SRGU and the TAGU, our STGN can combine the discriminative power of spatial inconsistency and the generalization capacity of temporal incoherence for face forgery detection. Our STGN achieves state-of-the-art performances on several popular forgery detection datasets. Extensive experiments demonstrate both the superiority of our STGN on intra manipulation evaluation and the effectiveness for new sorts of face forgery videos on cross manipulation evaluation.</p>","PeriodicalId":50940,"journal":{"name":"ACM Transactions on the Web","volume":"43 21","pages":""},"PeriodicalIF":2.6000,"publicationDate":"2023-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on the Web","FirstCategoryId":"94","ListUrlMain":"https://doi.org/https://dl.acm.org/doi/10.1145/3580512","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Recently, advanced development of facial manipulation techniques threatens web information security, thus, face forgery detection attracts a lot of attention. It is clear that both spatial and temporal information of facial videos contains the crucial manipulation traces, which are inevitably created during the generation process. However, most existing face forgery detectors only focus on the spatial artifacts or the temporal incoherence, and they are struggling to learn a significant and general kind of representations for manipulated facial videos. In this work, we propose to construct spatial-temporal graphs for fake videos to capture the spatial inconsistency and the temporal incoherence at the same time. To model the spatial-temporal relationship among the graph nodes, a novel forgery detector named Spatio-Temporal Graph Network (STGN) is proposed, which contains two kinds of graph-convolution-based units, the Spatial Relation Graph Unit (SRGU) and the Temporal Attention Graph Unit (TAGU). To exploit spatial information, the SRGU models the inconsistency between each pair of patches in the same frame, instead of focusing on the low-level local spatial artifacts which are vulnerable to samples created by unseen manipulation methods. And, the TAGU is proposed to model the long-distance temporal relation among the patches at the same spatial position in different frames with a graph attention mechanism based on the inter-node similarity. With the SRGU and the TAGU, our STGN can combine the discriminative power of spatial inconsistency and the generalization capacity of temporal incoherence for face forgery detection. Our STGN achieves state-of-the-art performances on several popular forgery detection datasets. Extensive experiments demonstrate both the superiority of our STGN on intra manipulation evaluation and the effectiveness for new sorts of face forgery videos on cross manipulation evaluation.

查看原文本刊更多论文

人脸伪造检测的时空图谱构建

近年来，人脸处理技术的飞速发展对网络信息安全构成了威胁，人脸伪造检测备受关注。显然，人脸视频的时空信息中都包含着关键的操作痕迹，这些痕迹是在生成过程中不可避免地产生的。然而，大多数现有的人脸伪造检测器只关注空间伪影或时间不相干，并且他们正在努力学习一种重要的和通用的面部伪造视频表示。在这项工作中，我们建议为假视频构建时空图，同时捕捉空间不一致性和时间不一致性。为了对图节点之间的时空关系进行建模，提出了一种新的伪造检测方法——时空图网络(STGN)，它包含两种基于图卷积的单元，即空间关系图单元(SRGU)和时间注意图单元(TAGU)。为了利用空间信息，SRGU对同一帧中每对补丁之间的不一致性进行建模，而不是专注于低级的局部空间伪影，这些伪影容易受到由看不见的操作方法创建的样本的影响。利用基于节点间相似度的图注意机制，提出了基于节点间相似度的图注意模型，对不同帧内同一空间位置的小块间的远距离时间关系进行建模。结合SRGU和TAGU，我们的STGN可以将空间不一致的判别能力和时间不相干的泛化能力结合起来用于人脸伪造检测。我们的STGN在几个流行的伪造检测数据集上实现了最先进的性能。大量的实验证明了我们的STGN在内部操作评估上的优越性和在交叉操作评估上对新型人脸伪造视频的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ACM Transactions on the Web 工程技术-计算机：软件工程

CiteScore

4.90

自引率

0.00%

发文量

审稿时长

7.5 months

期刊介绍： Transactions on the Web (TWEB) is a journal publishing refereed articles reporting the results of research on Web content, applications, use, and related enabling technologies. Topics in the scope of TWEB include but are not limited to the following: Browsers and Web Interfaces; Electronic Commerce; Electronic Publishing; Hypertext and Hypermedia; Semantic Web; Web Engineering; Web Services; and Service-Oriented Computing XML. In addition, papers addressing the intersection of the following broader technologies with the Web are also in scope: Accessibility; Business Services Education; Knowledge Management and Representation; Mobility and pervasive computing; Performance and scalability; Recommender systems; Searching, Indexing, Classification, Retrieval and Querying, Data Mining and Analysis; Security and Privacy; and User Interfaces. Papers discussing specific Web technologies, applications, content generation and management and use are within scope. Also, papers describing novel applications of the web as well as papers on the underlying technologies are welcome.