STIDNet：利用时空知识提炼进行身份识别型人脸伪造检测

IF 4.5 2区计算机科学 Q1 COMPUTER SCIENCE, CYBERNETICS

IEEE Transactions on Computational Social Systems Pub Date : 2024-02-12 DOI:10.1109/TCSS.2024.3356549

Mingqi Fang;Lingyun Yu;Hongtao Xie;Qingfeng Tan;Zhiyuan Tan;Amir Hussain;Zezheng Wang;Jiahong Li;Zhihong Tian

{"title":"STIDNet：利用时空知识提炼进行身份识别型人脸伪造检测","authors":"Mingqi Fang;Lingyun Yu;Hongtao Xie;Qingfeng Tan;Zhiyuan Tan;Amir Hussain;Zezheng Wang;Jiahong Li;Zhihong Tian","doi":"10.1109/TCSS.2024.3356549","DOIUrl":null,"url":null,"abstract":"The impressive development of facial manipulation techniques has raised severe public concerns. Identity-aware methods, especially suitable for protecting celebrities, are seen as one of promising face forgery detection approaches with additional reference video. However, without in-depth observation of fake video's characteristics, most existing identity-aware algorithms are just naive imitation of face verification model and fail to exploit discriminative information. In this article, we argue that it is necessary to take both spatial and temporal perspectives into consideration for adequate inconsistency clues and propose a novel forgery detector named SpatioTemporal IDentity network (STIDNet). To effectively capture heterogeneous spatiotemporal information in a unified formulation, our STIDNet is following a knowledge distillation architecture that the student identity extractor receives supervision from a spatial information encoder (SIE) and a temporal information encoder (TIE) through multiteacher training. Specifically, a regional sensitive identity modeling paradigm is proposed in SIE by introducing facial blending augmentation but with uniform identity label, thus encourage model to focus on spatial discriminative region like outer face. Meanwhile, considering the strong temporal correlation between audio and talking face video, our TIE is devised in a cross-modal pattern that the audio information is introduced to supervise model exploiting temporal personalized movements. Benefit from knowledge transfer from SIE and TIE, STIDNet is able to capture individual's essential spatiotemporal identity attributes and sensitive to even subtle identity deviation caused by manipulation. Extensive experiments indicate the superiority of our STIDNet compared with previous works. Moreover, we also demonstrate STIDNet is more suitable for real-world implementation in terms of model complexity and reference set size.","PeriodicalId":13044,"journal":{"name":"IEEE Transactions on Computational Social Systems","volume":null,"pages":null},"PeriodicalIF":4.5000,"publicationDate":"2024-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"STIDNet: Identity-Aware Face Forgery Detection With Spatiotemporal Knowledge Distillation\",\"authors\":\"Mingqi Fang;Lingyun Yu;Hongtao Xie;Qingfeng Tan;Zhiyuan Tan;Amir Hussain;Zezheng Wang;Jiahong Li;Zhihong Tian\",\"doi\":\"10.1109/TCSS.2024.3356549\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The impressive development of facial manipulation techniques has raised severe public concerns. Identity-aware methods, especially suitable for protecting celebrities, are seen as one of promising face forgery detection approaches with additional reference video. However, without in-depth observation of fake video's characteristics, most existing identity-aware algorithms are just naive imitation of face verification model and fail to exploit discriminative information. In this article, we argue that it is necessary to take both spatial and temporal perspectives into consideration for adequate inconsistency clues and propose a novel forgery detector named SpatioTemporal IDentity network (STIDNet). To effectively capture heterogeneous spatiotemporal information in a unified formulation, our STIDNet is following a knowledge distillation architecture that the student identity extractor receives supervision from a spatial information encoder (SIE) and a temporal information encoder (TIE) through multiteacher training. Specifically, a regional sensitive identity modeling paradigm is proposed in SIE by introducing facial blending augmentation but with uniform identity label, thus encourage model to focus on spatial discriminative region like outer face. Meanwhile, considering the strong temporal correlation between audio and talking face video, our TIE is devised in a cross-modal pattern that the audio information is introduced to supervise model exploiting temporal personalized movements. Benefit from knowledge transfer from SIE and TIE, STIDNet is able to capture individual's essential spatiotemporal identity attributes and sensitive to even subtle identity deviation caused by manipulation. Extensive experiments indicate the superiority of our STIDNet compared with previous works. Moreover, we also demonstrate STIDNet is more suitable for real-world implementation in terms of model complexity and reference set size.\",\"PeriodicalId\":13044,\"journal\":{\"name\":\"IEEE Transactions on Computational Social Systems\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":4.5000,\"publicationDate\":\"2024-02-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Computational Social Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10433236/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, CYBERNETICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computational Social Systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10433236/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, CYBERNETICS","Score":null,"Total":0}

引用次数: 0

摘要

面部伪造技术的迅猛发展引起了公众的严重关切。身份感知方法，尤其是适用于保护名人的身份感知方法，被认为是一种通过附加参考视频进行人脸伪造检测的有前途的方法。然而，由于缺乏对伪造视频特征的深入观察，大多数现有的身份感知算法只是对人脸验证模型的天真模仿，无法利用鉴别信息。在这篇文章中，我们认为有必要从空间和时间两个角度来获取足够的不一致线索，并提出了一种名为 "时空身份识别网络（STIDNet）"的新型伪造检测器。为了以统一的表述有效捕捉异构时空信息，我们的 STIDNet 采用了知识提炼架构，即学生身份提取器通过多教师训练接受空间信息编码器（SIE）和时间信息编码器（TIE）的监督。具体来说，在 SIE 中提出了一种区域敏感的身份建模范式，即通过引入面部混合增强但统一身份标签，从而鼓励模型将注意力集中在外侧面部等空间分辨区域。同时，考虑到音频和人脸视频之间存在很强的时间相关性，我们的 TIE 采用了跨模态模式，即引入音频信息来监督利用时间个性化运动的模型。得益于 SIE 和 TIE 的知识转移，STIDNet 能够捕捉个人的基本时空身份属性，并对操纵造成的细微身份偏差保持敏感。大量实验表明，与之前的研究相比，我们的 STIDNet 更具优势。此外，我们还证明 STIDNet 在模型复杂度和参考集大小方面更适合实际应用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

STIDNet: Identity-Aware Face Forgery Detection With Spatiotemporal Knowledge Distillation

The impressive development of facial manipulation techniques has raised severe public concerns. Identity-aware methods, especially suitable for protecting celebrities, are seen as one of promising face forgery detection approaches with additional reference video. However, without in-depth observation of fake video's characteristics, most existing identity-aware algorithms are just naive imitation of face verification model and fail to exploit discriminative information. In this article, we argue that it is necessary to take both spatial and temporal perspectives into consideration for adequate inconsistency clues and propose a novel forgery detector named SpatioTemporal IDentity network (STIDNet). To effectively capture heterogeneous spatiotemporal information in a unified formulation, our STIDNet is following a knowledge distillation architecture that the student identity extractor receives supervision from a spatial information encoder (SIE) and a temporal information encoder (TIE) through multiteacher training. Specifically, a regional sensitive identity modeling paradigm is proposed in SIE by introducing facial blending augmentation but with uniform identity label, thus encourage model to focus on spatial discriminative region like outer face. Meanwhile, considering the strong temporal correlation between audio and talking face video, our TIE is devised in a cross-modal pattern that the audio information is introduced to supervise model exploiting temporal personalized movements. Benefit from knowledge transfer from SIE and TIE, STIDNet is able to capture individual's essential spatiotemporal identity attributes and sensitive to even subtle identity deviation caused by manipulation. Extensive experiments indicate the superiority of our STIDNet compared with previous works. Moreover, we also demonstrate STIDNet is more suitable for real-world implementation in terms of model complexity and reference set size.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Computational Social Systems Social Sciences-Social Sciences (miscellaneous)

CiteScore

10.00

自引率

20.00%

发文量

316

期刊介绍： IEEE Transactions on Computational Social Systems focuses on such topics as modeling, simulation, analysis and understanding of social systems from the quantitative and/or computational perspective. "Systems" include man-man, man-machine and machine-machine organizations and adversarial situations as well as social media structures and their dynamics. More specifically, the proposed transactions publishes articles on modeling the dynamics of social systems, methodologies for incorporating and representing socio-cultural and behavioral aspects in computational modeling, analysis of social system behavior and structure, and paradigms for social systems modeling and simulation. The journal also features articles on social network dynamics, social intelligence and cognition, social systems design and architectures, socio-cultural modeling and representation, and computational behavior modeling, and their applications.