{"title":"Constructing Spatio-Temporal Graphs for Face Forgery Detection","authors":"Zhihua Shang, Hongtao Xie, Lingyun Yu, Zhengjun Zha, Yongdong Zhang","doi":"https://dl.acm.org/doi/10.1145/3580512","DOIUrl":null,"url":null,"abstract":"<p>Recently, advanced development of facial manipulation techniques threatens web information security, thus, face forgery detection attracts a lot of attention. It is clear that both spatial and temporal information of facial videos contains the crucial manipulation traces, which are inevitably created during the generation process. However, most existing face forgery detectors only focus on the spatial artifacts or the temporal incoherence, and they are struggling to learn a significant and general kind of representations for manipulated facial videos. In this work, we propose to construct spatial-temporal graphs for fake videos to capture the spatial inconsistency and the temporal incoherence at the same time. To model the spatial-temporal relationship among the graph nodes, a novel forgery detector named Spatio-Temporal Graph Network (STGN) is proposed, which contains two kinds of graph-convolution-based units, the Spatial Relation Graph Unit (SRGU) and the Temporal Attention Graph Unit (TAGU). To exploit spatial information, the SRGU models the inconsistency between each pair of patches in the same frame, instead of focusing on the low-level local spatial artifacts which are vulnerable to samples created by unseen manipulation methods. And, the TAGU is proposed to model the long-distance temporal relation among the patches at the same spatial position in different frames with a graph attention mechanism based on the inter-node similarity. With the SRGU and the TAGU, our STGN can combine the discriminative power of spatial inconsistency and the generalization capacity of temporal incoherence for face forgery detection. Our STGN achieves state-of-the-art performances on several popular forgery detection datasets. Extensive experiments demonstrate both the superiority of our STGN on intra manipulation evaluation and the effectiveness for new sorts of face forgery videos on cross manipulation evaluation.</p>","PeriodicalId":50940,"journal":{"name":"ACM Transactions on the Web","volume":"43 21","pages":""},"PeriodicalIF":2.6000,"publicationDate":"2023-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on the Web","FirstCategoryId":"94","ListUrlMain":"https://doi.org/https://dl.acm.org/doi/10.1145/3580512","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Recently, advanced development of facial manipulation techniques threatens web information security, thus, face forgery detection attracts a lot of attention. It is clear that both spatial and temporal information of facial videos contains the crucial manipulation traces, which are inevitably created during the generation process. However, most existing face forgery detectors only focus on the spatial artifacts or the temporal incoherence, and they are struggling to learn a significant and general kind of representations for manipulated facial videos. In this work, we propose to construct spatial-temporal graphs for fake videos to capture the spatial inconsistency and the temporal incoherence at the same time. To model the spatial-temporal relationship among the graph nodes, a novel forgery detector named Spatio-Temporal Graph Network (STGN) is proposed, which contains two kinds of graph-convolution-based units, the Spatial Relation Graph Unit (SRGU) and the Temporal Attention Graph Unit (TAGU). To exploit spatial information, the SRGU models the inconsistency between each pair of patches in the same frame, instead of focusing on the low-level local spatial artifacts which are vulnerable to samples created by unseen manipulation methods. And, the TAGU is proposed to model the long-distance temporal relation among the patches at the same spatial position in different frames with a graph attention mechanism based on the inter-node similarity. With the SRGU and the TAGU, our STGN can combine the discriminative power of spatial inconsistency and the generalization capacity of temporal incoherence for face forgery detection. Our STGN achieves state-of-the-art performances on several popular forgery detection datasets. Extensive experiments demonstrate both the superiority of our STGN on intra manipulation evaluation and the effectiveness for new sorts of face forgery videos on cross manipulation evaluation.
期刊介绍:
Transactions on the Web (TWEB) is a journal publishing refereed articles reporting the results of research on Web content, applications, use, and related enabling technologies. Topics in the scope of TWEB include but are not limited to the following: Browsers and Web Interfaces; Electronic Commerce; Electronic Publishing; Hypertext and Hypermedia; Semantic Web; Web Engineering; Web Services; and Service-Oriented Computing XML.
In addition, papers addressing the intersection of the following broader technologies with the Web are also in scope: Accessibility; Business Services Education; Knowledge Management and Representation; Mobility and pervasive computing; Performance and scalability; Recommender systems; Searching, Indexing, Classification, Retrieval and Querying, Data Mining and Analysis; Security and Privacy; and User Interfaces.
Papers discussing specific Web technologies, applications, content generation and management and use are within scope. Also, papers describing novel applications of the web as well as papers on the underlying technologies are welcome.