{"title":"Edge-aware graph reasoning network for image manipulation localization","authors":"Ruyi Bai","doi":"10.1016/j.cviu.2025.104490","DOIUrl":null,"url":null,"abstract":"<div><div>Convolution networks continue to be the dominant approach in current research on image manipulation localization. However, their inherent sensory field limitation results in the network primarily focusing on local feature extraction, which largely ignores the importance of long-range contextual information. To overcome this limitation, researchers have proposed methods such as multi-scale feature fusion, self-attention mechanisms, and pyramid pooling. Nevertheless, these methods frequently encounter difficulties in feature coupling between tampered and non-tampered regions in practice, which constrains the model performance. To address these issues, we propose an innovative edge-aware graph reasoning network (EGRNet). The core advantage of this network is that it can effectively enhance the similarity of features within the tampered region while weakening the information interaction between the tampered region and non-tampered region, thus enabling the precise localization of the tampered region. The network employs dual-stream encoders, comprising an RGB encoder and an SRM encoder, to extract visual and noise features, respectively. It then utilizes spatial pyramid graph reasoning to fuse features at different scales. The graph reasoning architecture comprises two components: the Cross Graph Convolution Feature Fusion Module (CGCFFM) and the Edge-aware Graph Attention Module (EGAM). CGCFFM realizes adaptive fusion of visual and noise features by performing cross spatial graph convolution and channel graph convolution operations. The integration of two-stream features is facilitated by graph convolution, which enables the mapping of the features to a novel low-dimensional space. This space is capable of modeling long-range contextual relationships. EGAM applies binary edge information to the graph attention adjacency matrix. This enables the network to explicitly model the inconsistency between the tampered region and non-tampered region. EGAM effectively suppresses the interference of the non-tampered region to the tampered region, thereby improves the network’s ability to recognize tampered regions. To validate the performance of EGRNet, extensive experiments are conducted on several challenging benchmark datasets. The experimental results indicate that our method outperforms current state-of-the-art image manipulation localization methods in both qualitative and quantitative evaluations. Furthermore, EGRNet demonstrated strong adaptability to different types of tampering and robustness to various attacks.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"260 ","pages":"Article 104490"},"PeriodicalIF":3.5000,"publicationDate":"2025-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Vision and Image Understanding","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1077314225002139","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Convolution networks continue to be the dominant approach in current research on image manipulation localization. However, their inherent sensory field limitation results in the network primarily focusing on local feature extraction, which largely ignores the importance of long-range contextual information. To overcome this limitation, researchers have proposed methods such as multi-scale feature fusion, self-attention mechanisms, and pyramid pooling. Nevertheless, these methods frequently encounter difficulties in feature coupling between tampered and non-tampered regions in practice, which constrains the model performance. To address these issues, we propose an innovative edge-aware graph reasoning network (EGRNet). The core advantage of this network is that it can effectively enhance the similarity of features within the tampered region while weakening the information interaction between the tampered region and non-tampered region, thus enabling the precise localization of the tampered region. The network employs dual-stream encoders, comprising an RGB encoder and an SRM encoder, to extract visual and noise features, respectively. It then utilizes spatial pyramid graph reasoning to fuse features at different scales. The graph reasoning architecture comprises two components: the Cross Graph Convolution Feature Fusion Module (CGCFFM) and the Edge-aware Graph Attention Module (EGAM). CGCFFM realizes adaptive fusion of visual and noise features by performing cross spatial graph convolution and channel graph convolution operations. The integration of two-stream features is facilitated by graph convolution, which enables the mapping of the features to a novel low-dimensional space. This space is capable of modeling long-range contextual relationships. EGAM applies binary edge information to the graph attention adjacency matrix. This enables the network to explicitly model the inconsistency between the tampered region and non-tampered region. EGAM effectively suppresses the interference of the non-tampered region to the tampered region, thereby improves the network’s ability to recognize tampered regions. To validate the performance of EGRNet, extensive experiments are conducted on several challenging benchmark datasets. The experimental results indicate that our method outperforms current state-of-the-art image manipulation localization methods in both qualitative and quantitative evaluations. Furthermore, EGRNet demonstrated strong adaptability to different types of tampering and robustness to various attacks.
期刊介绍:
The central focus of this journal is the computer analysis of pictorial information. Computer Vision and Image Understanding publishes papers covering all aspects of image analysis from the low-level, iconic processes of early vision to the high-level, symbolic processes of recognition and interpretation. A wide range of topics in the image understanding area is covered, including papers offering insights that differ from predominant views.
Research Areas Include:
• Theory
• Early vision
• Data structures and representations
• Shape
• Range
• Motion
• Matching and recognition
• Architecture and languages
• Vision systems