Explainable Dual-Stream Attention Network for Image Forgery Detection and Localisation Using Contrastive Learning

IF 1.5 4区管理学 Q3 ENGINEERING, ELECTRICAL & ELECTRONIC

Iet Radar Sonar and Navigation Pub Date : 2025-08-05 DOI:10.1049/rsn2.70064

Maryam Munawar, Mourad Oussalah

{"title":"Explainable Dual-Stream Attention Network for Image Forgery Detection and Localisation Using Contrastive Learning","authors":"Maryam Munawar, Mourad Oussalah","doi":"10.1049/rsn2.70064","DOIUrl":null,"url":null,"abstract":"Image forgery detection aims to identify tampered content and localise manipulated regions within images. With the rise of advanced editing tools, forgeries pose serious challenges across media, law and scientific domains. Existing CNN-based models struggle to detect subtle manipulations that mimic natural image patterns. To address this challenge, we propose a dual-stream contrastive learning network (DSCL-Net) that jointly exploits spatial (pixel-level) and frequency (noise-level) cues. The architecture employs two ResNet-50 encoders: one processes the red–green–blue (RGB) image to capture semantic context, whereas the other processes a spatial rich model (SRM) filtered version to extract high-frequency forensic traces. A multi-scale attention fusion module enhances manipulation-sensitive features. The network includes three heads: a classification head for image-level prediction, a segmentation head for pixel-wise localisation, and a contrastive projection head to improve feature discrimination. We validate the effectiveness of our proposed model on two benchmark datasets. The proposed DSCL-Net surpasses previous state-of-the-art methods by achieving an image-level accuracy of 97.9% on the CASIA and 89.8% on IMD2020. At the pixel level, it attains an F1-score of 92.7% and an AUC of 91.2% on CASIA, and an F1-score of 86.6% with an AUC of 90.1% on IMD2020. Furthermore, LIME and SHAP have been employed to provide explainability at individual image level to comprehend the alignment of the predicted mask with the ground truth mask. The developed approach contributes to the development of safe technology for dealing with misinformation and fake news.","PeriodicalId":50377,"journal":{"name":"Iet Radar Sonar and Navigation","volume":"19 1","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2025-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/rsn2.70064","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Iet Radar Sonar and Navigation","FirstCategoryId":"94","ListUrlMain":"https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/rsn2.70064","RegionNum":4,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

Image forgery detection aims to identify tampered content and localise manipulated regions within images. With the rise of advanced editing tools, forgeries pose serious challenges across media, law and scientific domains. Existing CNN-based models struggle to detect subtle manipulations that mimic natural image patterns. To address this challenge, we propose a dual-stream contrastive learning network (DSCL-Net) that jointly exploits spatial (pixel-level) and frequency (noise-level) cues. The architecture employs two ResNet-50 encoders: one processes the red–green–blue (RGB) image to capture semantic context, whereas the other processes a spatial rich model (SRM) filtered version to extract high-frequency forensic traces. A multi-scale attention fusion module enhances manipulation-sensitive features. The network includes three heads: a classification head for image-level prediction, a segmentation head for pixel-wise localisation, and a contrastive projection head to improve feature discrimination. We validate the effectiveness of our proposed model on two benchmark datasets. The proposed DSCL-Net surpasses previous state-of-the-art methods by achieving an image-level accuracy of 97.9% on the CASIA and 89.8% on IMD2020. At the pixel level, it attains an F1-score of 92.7% and an AUC of 91.2% on CASIA, and an F1-score of 86.6% with an AUC of 90.1% on IMD2020. Furthermore, LIME and SHAP have been employed to provide explainability at individual image level to comprehend the alignment of the predicted mask with the ground truth mask. The developed approach contributes to the development of safe technology for dealing with misinformation and fake news.

Abstract Image

查看原文本刊更多论文

基于对比学习的图像伪造检测和定位的可解释双流注意网络

图像伪造检测的目的是识别被篡改的内容，并在图像中定位被操纵的区域。随着先进编辑工具的兴起，伪造在媒体、法律和科学领域构成了严峻的挑战。现有的基于cnn的模型很难检测到模仿自然图像模式的微妙操纵。为了解决这一挑战，我们提出了一种双流对比学习网络（DSCL-Net），它共同利用空间（像素级）和频率（噪声级）线索。该架构采用两个ResNet-50编码器：一个处理红绿蓝（RGB）图像以捕获语义上下文，而另一个处理空间丰富模型（SRM）过滤版本以提取高频取证痕迹。多尺度注意力融合模块增强了操作敏感性。该网络包括三个头：用于图像级预测的分类头，用于逐像素定位的分割头，以及用于改进特征识别的对比投影头。我们在两个基准数据集上验证了我们提出的模型的有效性。所提出的DSCL-Net超越了以前最先进的方法，在CASIA上实现了97.9%的图像级精度，在IMD2020上达到了89.8%。在像元水平上，在CASIA上f1得分为92.7%，AUC为91.2%；在IMD2020上f1得分为86.6%，AUC为90.1%。此外，LIME和SHAP已被用于在单个图像级别提供可解释性，以理解预测掩模与地面真值掩模的对齐。开发的方法有助于开发处理错误信息和假新闻的安全技术。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Iet Radar Sonar and Navigation 工程技术-电信学

CiteScore

4.10

自引率

11.80%

发文量

137

审稿时长

3.4 months

期刊介绍： IET Radar, Sonar & Navigation covers the theory and practice of systems and signals for radar, sonar, radiolocation, navigation, and surveillance purposes, in aerospace and terrestrial applications. Examples include advances in waveform design, clutter and detection, electronic warfare, adaptive array and superresolution methods, tracking algorithms, synthetic aperture, and target recognition techniques.