Siyou Guo , Qilei Li , Mingliang Gao , Xianxun Zhu , Imad Rida
{"title":"基于空间核选择和Halo注意网络的广义深度伪造检测","authors":"Siyou Guo , Qilei Li , Mingliang Gao , Xianxun Zhu , Imad Rida","doi":"10.1016/j.imavis.2025.105582","DOIUrl":null,"url":null,"abstract":"<div><div>The rapid advancement of AI-Generated Content (AIGC) has enabled the unprecedented synthesis of photorealistic facial images. While these technologies offer transformative potential for creative industries, they also introduce significant risks due to the malicious manipulation of visual media. Current deepfake detection methods struggle with unseen forgeries due to their inability to consider the effects of spatial receptive fields and local representation learning. To bridge these gaps, this paper proposes a Spatial Kernel Selection and Halo Attention Network (SKSHA-Net) for deepfake detection. The proposed model incorporates two key modules, namely Spatial Kernel Selection (SKS) and Halo Attention (HA). The SKS module dynamically adjusts the spatial receptive field to capture subtle artifacts indicative of forgery. The HA module focuses on the intricate relationships between neighboring pixels for local representation learning. Comparative experiments on three public datasets demonstrate that SKSHA-Net outperforms the state-of-the-art (SOTA) methods in both intra-testing and cross-testing.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"160 ","pages":"Article 105582"},"PeriodicalIF":4.2000,"publicationDate":"2025-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Generalizable deepfake detection via Spatial Kernel Selection and Halo Attention Network\",\"authors\":\"Siyou Guo , Qilei Li , Mingliang Gao , Xianxun Zhu , Imad Rida\",\"doi\":\"10.1016/j.imavis.2025.105582\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The rapid advancement of AI-Generated Content (AIGC) has enabled the unprecedented synthesis of photorealistic facial images. While these technologies offer transformative potential for creative industries, they also introduce significant risks due to the malicious manipulation of visual media. Current deepfake detection methods struggle with unseen forgeries due to their inability to consider the effects of spatial receptive fields and local representation learning. To bridge these gaps, this paper proposes a Spatial Kernel Selection and Halo Attention Network (SKSHA-Net) for deepfake detection. The proposed model incorporates two key modules, namely Spatial Kernel Selection (SKS) and Halo Attention (HA). The SKS module dynamically adjusts the spatial receptive field to capture subtle artifacts indicative of forgery. The HA module focuses on the intricate relationships between neighboring pixels for local representation learning. Comparative experiments on three public datasets demonstrate that SKSHA-Net outperforms the state-of-the-art (SOTA) methods in both intra-testing and cross-testing.</div></div>\",\"PeriodicalId\":50374,\"journal\":{\"name\":\"Image and Vision Computing\",\"volume\":\"160 \",\"pages\":\"Article 105582\"},\"PeriodicalIF\":4.2000,\"publicationDate\":\"2025-05-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Image and Vision Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0262885625001702\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Image and Vision Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0262885625001702","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Generalizable deepfake detection via Spatial Kernel Selection and Halo Attention Network
The rapid advancement of AI-Generated Content (AIGC) has enabled the unprecedented synthesis of photorealistic facial images. While these technologies offer transformative potential for creative industries, they also introduce significant risks due to the malicious manipulation of visual media. Current deepfake detection methods struggle with unseen forgeries due to their inability to consider the effects of spatial receptive fields and local representation learning. To bridge these gaps, this paper proposes a Spatial Kernel Selection and Halo Attention Network (SKSHA-Net) for deepfake detection. The proposed model incorporates two key modules, namely Spatial Kernel Selection (SKS) and Halo Attention (HA). The SKS module dynamically adjusts the spatial receptive field to capture subtle artifacts indicative of forgery. The HA module focuses on the intricate relationships between neighboring pixels for local representation learning. Comparative experiments on three public datasets demonstrate that SKSHA-Net outperforms the state-of-the-art (SOTA) methods in both intra-testing and cross-testing.
期刊介绍:
Image and Vision Computing has as a primary aim the provision of an effective medium of interchange for the results of high quality theoretical and applied research fundamental to all aspects of image interpretation and computer vision. The journal publishes work that proposes new image interpretation and computer vision methodology or addresses the application of such methods to real world scenes. It seeks to strengthen a deeper understanding in the discipline by encouraging the quantitative comparison and performance evaluation of the proposed methodology. The coverage includes: image interpretation, scene modelling, object recognition and tracking, shape analysis, monitoring and surveillance, active vision and robotic systems, SLAM, biologically-inspired computer vision, motion analysis, stereo vision, document image understanding, character and handwritten text recognition, face and gesture recognition, biometrics, vision-based human-computer interaction, human activity and behavior understanding, data fusion from multiple sensor inputs, image databases.