Xiao Yang , Xiuli Chai , Zhihua Gan , Lvchen Cao , Yushu Zhang
{"title":"MSHRT-Net: Multi-scale hierarchical residual transfer network for image manipulation detection and localization","authors":"Xiao Yang , Xiuli Chai , Zhihua Gan , Lvchen Cao , Yushu Zhang","doi":"10.1016/j.neucom.2025.130788","DOIUrl":null,"url":null,"abstract":"<div><div>The proliferation of malicious image tampering has triggered a trust crisis in the authenticity of visual content, posing potential risks. However, existing methods have limitations when dealing with complex tampering. With the rapid development of forgery techniques and the increasing stealthiness of tampering methods, these methods are gradually becoming ineffective, struggling to effectively detect and accurately locate the tampered areas in images. To address this issue, we have developed the Multi-Scale Hierarchical Residual Transfer Network (MSHRT-Net), which focuses on edge texture and multi-scale information extraction for efficient image tampering detection and localization. Specifically, the Adaptive Gabor Texture Extractor (AGTE) employs a dual-stream-like structure with edge texture and spatial features extracted in parallel. To enhance the expressiveness of the extracted features, we presented the Multi-Scale Hierarchical Residual Module (MSHRM) as the encoder-decoder layer of the backbone network, which captured global and local information via three parallel branches at distinct scales. Subsequently, the Detail-Preserving Skip Module (DPSM), constructed with skip connections, further improves the network’s feature-capturing capability. Additionally, to address inconsistencies between features at different scales, we designed a Dual-Dimensional Attention Module (DAM), which filtered critical information from coarse feature maps while suppressing irrelevant content. Finally, to address tampering-type imbalance in training data, we proposed a class of loss functions that improved the model’s ability to detect and localize various types of tampering. Extensive experimental validation on multiple datasets demonstrates that our model surpasses previous methods, particularly on the DSO dataset with realistic scenarios (pixel-level AUC and IoU: 0.993, 0.966).</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"648 ","pages":"Article 130788"},"PeriodicalIF":5.5000,"publicationDate":"2025-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231225014602","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
The proliferation of malicious image tampering has triggered a trust crisis in the authenticity of visual content, posing potential risks. However, existing methods have limitations when dealing with complex tampering. With the rapid development of forgery techniques and the increasing stealthiness of tampering methods, these methods are gradually becoming ineffective, struggling to effectively detect and accurately locate the tampered areas in images. To address this issue, we have developed the Multi-Scale Hierarchical Residual Transfer Network (MSHRT-Net), which focuses on edge texture and multi-scale information extraction for efficient image tampering detection and localization. Specifically, the Adaptive Gabor Texture Extractor (AGTE) employs a dual-stream-like structure with edge texture and spatial features extracted in parallel. To enhance the expressiveness of the extracted features, we presented the Multi-Scale Hierarchical Residual Module (MSHRM) as the encoder-decoder layer of the backbone network, which captured global and local information via three parallel branches at distinct scales. Subsequently, the Detail-Preserving Skip Module (DPSM), constructed with skip connections, further improves the network’s feature-capturing capability. Additionally, to address inconsistencies between features at different scales, we designed a Dual-Dimensional Attention Module (DAM), which filtered critical information from coarse feature maps while suppressing irrelevant content. Finally, to address tampering-type imbalance in training data, we proposed a class of loss functions that improved the model’s ability to detect and localize various types of tampering. Extensive experimental validation on multiple datasets demonstrates that our model surpasses previous methods, particularly on the DSO dataset with realistic scenarios (pixel-level AUC and IoU: 0.993, 0.966).
期刊介绍:
Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.