{"title":"Explainable Dual-Stream Attention Network for Image Forgery Detection and Localisation Using Contrastive Learning","authors":"Maryam Munawar, Mourad Oussalah","doi":"10.1049/rsn2.70064","DOIUrl":null,"url":null,"abstract":"<p>Image forgery detection aims to identify tampered content and localise manipulated regions within images. With the rise of advanced editing tools, forgeries pose serious challenges across media, law and scientific domains. Existing CNN-based models struggle to detect subtle manipulations that mimic natural image patterns. To address this challenge, we propose a dual-stream contrastive learning network (DSCL-Net) that jointly exploits spatial (pixel-level) and frequency (noise-level) cues. The architecture employs two ResNet-50 encoders: one processes the red–green–blue (RGB) image to capture semantic context, whereas the other processes a spatial rich model (SRM) filtered version to extract high-frequency forensic traces. A multi-scale attention fusion module enhances manipulation-sensitive features. The network includes three heads: a classification head for image-level prediction, a segmentation head for pixel-wise localisation, and a contrastive projection head to improve feature discrimination. We validate the effectiveness of our proposed model on two benchmark datasets. The proposed DSCL-Net surpasses previous state-of-the-art methods by achieving an image-level accuracy of 97.9% on the CASIA and 89.8% on IMD2020. At the pixel level, it attains an <i>F</i>1-score of 92.7% and an AUC of 91.2% on CASIA, and an <i>F</i>1-score of 86.6% with an AUC of 90.1% on IMD2020. Furthermore, LIME and SHAP have been employed to provide explainability at individual image level to comprehend the alignment of the predicted mask with the ground truth mask. The developed approach contributes to the development of safe technology for dealing with misinformation and fake news.</p>","PeriodicalId":50377,"journal":{"name":"Iet Radar Sonar and Navigation","volume":"19 1","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2025-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/rsn2.70064","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Iet Radar Sonar and Navigation","FirstCategoryId":"94","ListUrlMain":"https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/rsn2.70064","RegionNum":4,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
Image forgery detection aims to identify tampered content and localise manipulated regions within images. With the rise of advanced editing tools, forgeries pose serious challenges across media, law and scientific domains. Existing CNN-based models struggle to detect subtle manipulations that mimic natural image patterns. To address this challenge, we propose a dual-stream contrastive learning network (DSCL-Net) that jointly exploits spatial (pixel-level) and frequency (noise-level) cues. The architecture employs two ResNet-50 encoders: one processes the red–green–blue (RGB) image to capture semantic context, whereas the other processes a spatial rich model (SRM) filtered version to extract high-frequency forensic traces. A multi-scale attention fusion module enhances manipulation-sensitive features. The network includes three heads: a classification head for image-level prediction, a segmentation head for pixel-wise localisation, and a contrastive projection head to improve feature discrimination. We validate the effectiveness of our proposed model on two benchmark datasets. The proposed DSCL-Net surpasses previous state-of-the-art methods by achieving an image-level accuracy of 97.9% on the CASIA and 89.8% on IMD2020. At the pixel level, it attains an F1-score of 92.7% and an AUC of 91.2% on CASIA, and an F1-score of 86.6% with an AUC of 90.1% on IMD2020. Furthermore, LIME and SHAP have been employed to provide explainability at individual image level to comprehend the alignment of the predicted mask with the ground truth mask. The developed approach contributes to the development of safe technology for dealing with misinformation and fake news.
期刊介绍:
IET Radar, Sonar & Navigation covers the theory and practice of systems and signals for radar, sonar, radiolocation, navigation, and surveillance purposes, in aerospace and terrestrial applications.
Examples include advances in waveform design, clutter and detection, electronic warfare, adaptive array and superresolution methods, tracking algorithms, synthetic aperture, and target recognition techniques.