面向图像处理定位的边缘感知图推理网络

IF 3.5 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Computer Vision and Image Understanding Pub Date : 2025-09-03 DOI:10.1016/j.cviu.2025.104490

Ruyi Bai

{"title":"面向图像处理定位的边缘感知图推理网络","authors":"Ruyi Bai","doi":"10.1016/j.cviu.2025.104490","DOIUrl":null,"url":null,"abstract":"<div><div>Convolution networks continue to be the dominant approach in current research on image manipulation localization. However, their inherent sensory field limitation results in the network primarily focusing on local feature extraction, which largely ignores the importance of long-range contextual information. To overcome this limitation, researchers have proposed methods such as multi-scale feature fusion, self-attention mechanisms, and pyramid pooling. Nevertheless, these methods frequently encounter difficulties in feature coupling between tampered and non-tampered regions in practice, which constrains the model performance. To address these issues, we propose an innovative edge-aware graph reasoning network (EGRNet). The core advantage of this network is that it can effectively enhance the similarity of features within the tampered region while weakening the information interaction between the tampered region and non-tampered region, thus enabling the precise localization of the tampered region. The network employs dual-stream encoders, comprising an RGB encoder and an SRM encoder, to extract visual and noise features, respectively. It then utilizes spatial pyramid graph reasoning to fuse features at different scales. The graph reasoning architecture comprises two components: the Cross Graph Convolution Feature Fusion Module (CGCFFM) and the Edge-aware Graph Attention Module (EGAM). CGCFFM realizes adaptive fusion of visual and noise features by performing cross spatial graph convolution and channel graph convolution operations. The integration of two-stream features is facilitated by graph convolution, which enables the mapping of the features to a novel low-dimensional space. This space is capable of modeling long-range contextual relationships. EGAM applies binary edge information to the graph attention adjacency matrix. This enables the network to explicitly model the inconsistency between the tampered region and non-tampered region. EGAM effectively suppresses the interference of the non-tampered region to the tampered region, thereby improves the network’s ability to recognize tampered regions. To validate the performance of EGRNet, extensive experiments are conducted on several challenging benchmark datasets. The experimental results indicate that our method outperforms current state-of-the-art image manipulation localization methods in both qualitative and quantitative evaluations. Furthermore, EGRNet demonstrated strong adaptability to different types of tampering and robustness to various attacks.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"260 ","pages":"Article 104490"},"PeriodicalIF":3.5000,"publicationDate":"2025-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Edge-aware graph reasoning network for image manipulation localization\",\"authors\":\"Ruyi Bai\",\"doi\":\"10.1016/j.cviu.2025.104490\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Convolution networks continue to be the dominant approach in current research on image manipulation localization. However, their inherent sensory field limitation results in the network primarily focusing on local feature extraction, which largely ignores the importance of long-range contextual information. To overcome this limitation, researchers have proposed methods such as multi-scale feature fusion, self-attention mechanisms, and pyramid pooling. Nevertheless, these methods frequently encounter difficulties in feature coupling between tampered and non-tampered regions in practice, which constrains the model performance. To address these issues, we propose an innovative edge-aware graph reasoning network (EGRNet). The core advantage of this network is that it can effectively enhance the similarity of features within the tampered region while weakening the information interaction between the tampered region and non-tampered region, thus enabling the precise localization of the tampered region. The network employs dual-stream encoders, comprising an RGB encoder and an SRM encoder, to extract visual and noise features, respectively. It then utilizes spatial pyramid graph reasoning to fuse features at different scales. The graph reasoning architecture comprises two components: the Cross Graph Convolution Feature Fusion Module (CGCFFM) and the Edge-aware Graph Attention Module (EGAM). CGCFFM realizes adaptive fusion of visual and noise features by performing cross spatial graph convolution and channel graph convolution operations. The integration of two-stream features is facilitated by graph convolution, which enables the mapping of the features to a novel low-dimensional space. This space is capable of modeling long-range contextual relationships. EGAM applies binary edge information to the graph attention adjacency matrix. This enables the network to explicitly model the inconsistency between the tampered region and non-tampered region. EGAM effectively suppresses the interference of the non-tampered region to the tampered region, thereby improves the network’s ability to recognize tampered regions. To validate the performance of EGRNet, extensive experiments are conducted on several challenging benchmark datasets. The experimental results indicate that our method outperforms current state-of-the-art image manipulation localization methods in both qualitative and quantitative evaluations. Furthermore, EGRNet demonstrated strong adaptability to different types of tampering and robustness to various attacks.</div></div>\",\"PeriodicalId\":50633,\"journal\":{\"name\":\"Computer Vision and Image Understanding\",\"volume\":\"260 \",\"pages\":\"Article 104490\"},\"PeriodicalIF\":3.5000,\"publicationDate\":\"2025-09-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer Vision and Image Understanding\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1077314225002139\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Vision and Image Understanding","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1077314225002139","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

卷积网络仍然是当前图像处理定位研究的主流方法。然而，它们固有的感官场限制导致网络主要关注局部特征提取，这在很大程度上忽略了远程上下文信息的重要性。为了克服这一限制，研究人员提出了多尺度特征融合、自注意机制和金字塔池等方法。然而，这些方法在实践中经常遇到篡改区域与非篡改区域之间特征耦合的困难，从而制约了模型的性能。为了解决这些问题，我们提出了一种创新的边缘感知图推理网络（EGRNet）。该网络的核心优势在于可以有效增强篡改区域内特征的相似性，同时减弱篡改区域与非篡改区域之间的信息交互，从而实现篡改区域的精确定位。该网络采用双流编码器，包括一个RGB编码器和一个SRM编码器，分别提取视觉和噪声特征。然后利用空间金字塔图推理来融合不同尺度的特征。图推理体系结构包括两个部分：交叉图卷积特征融合模块（CGCFFM）和边缘感知图注意模块（EGAM）。CGCFFM通过进行跨空间图卷积和通道图卷积运算，实现了视觉特征和噪声特征的自适应融合。图卷积促进了两流特征的集成，使特征映射到一个新的低维空间。这个空间能够对远程上下文关系进行建模。EGAM将二值边缘信息应用到图的注意邻接矩阵中。这使得网络能够显式地对篡改区域和非篡改区域之间的不一致性进行建模。EGAM有效地抑制了非篡改区域对篡改区域的干扰，从而提高了网络对篡改区域的识别能力。为了验证EGRNet的性能，在几个具有挑战性的基准数据集上进行了大量的实验。实验结果表明，我们的方法在定性和定量评估中都优于当前最先进的图像处理定位方法。此外，EGRNet对不同类型的篡改具有较强的适应性和对各种攻击的鲁棒性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Edge-aware graph reasoning network for image manipulation localization

Convolution networks continue to be the dominant approach in current research on image manipulation localization. However, their inherent sensory field limitation results in the network primarily focusing on local feature extraction, which largely ignores the importance of long-range contextual information. To overcome this limitation, researchers have proposed methods such as multi-scale feature fusion, self-attention mechanisms, and pyramid pooling. Nevertheless, these methods frequently encounter difficulties in feature coupling between tampered and non-tampered regions in practice, which constrains the model performance. To address these issues, we propose an innovative edge-aware graph reasoning network (EGRNet). The core advantage of this network is that it can effectively enhance the similarity of features within the tampered region while weakening the information interaction between the tampered region and non-tampered region, thus enabling the precise localization of the tampered region. The network employs dual-stream encoders, comprising an RGB encoder and an SRM encoder, to extract visual and noise features, respectively. It then utilizes spatial pyramid graph reasoning to fuse features at different scales. The graph reasoning architecture comprises two components: the Cross Graph Convolution Feature Fusion Module (CGCFFM) and the Edge-aware Graph Attention Module (EGAM). CGCFFM realizes adaptive fusion of visual and noise features by performing cross spatial graph convolution and channel graph convolution operations. The integration of two-stream features is facilitated by graph convolution, which enables the mapping of the features to a novel low-dimensional space. This space is capable of modeling long-range contextual relationships. EGAM applies binary edge information to the graph attention adjacency matrix. This enables the network to explicitly model the inconsistency between the tampered region and non-tampered region. EGAM effectively suppresses the interference of the non-tampered region to the tampered region, thereby improves the network’s ability to recognize tampered regions. To validate the performance of EGRNet, extensive experiments are conducted on several challenging benchmark datasets. The experimental results indicate that our method outperforms current state-of-the-art image manipulation localization methods in both qualitative and quantitative evaluations. Furthermore, EGRNet demonstrated strong adaptability to different types of tampering and robustness to various attacks.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computer Vision and Image Understanding 工程技术-工程：电子与电气

CiteScore

7.80

自引率

4.40%

发文量

112

审稿时长

79 days

期刊介绍： The central focus of this journal is the computer analysis of pictorial information. Computer Vision and Image Understanding publishes papers covering all aspects of image analysis from the low-level, iconic processes of early vision to the high-level, symbolic processes of recognition and interpretation. A wide range of topics in the image understanding area is covered, including papers offering insights that differ from predominant views. Research Areas Include: • Theory • Early vision • Data structures and representations • Shape • Range • Motion • Matching and recognition • Architecture and languages • Vision systems