GDVIFNet：一种具有边缘特征引导的用于显著目标检测的生成深度和可见图像融合网络

IF 6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Networks Pub Date : 2025-04-05 DOI:10.1016/j.neunet.2025.107445

Xiaogang Song , Yuping Tan , Xiaochang Li , Xinhong Hei

{"title":"GDVIFNet：一种具有边缘特征引导的用于显著目标检测的生成深度和可见图像融合网络","authors":"Xiaogang Song , Yuping Tan , Xiaochang Li , Xinhong Hei","doi":"10.1016/j.neunet.2025.107445","DOIUrl":null,"url":null,"abstract":"<div><div>In recent years, despite significant advancements in salient object detection (SOD), performance in complex interference environments remains suboptimal. To address these challenges, additional modalities like depth (SOD-D) or thermal imaging (SOD-T) are often introduced. However, existing methods typically rely on specialized depth or thermal devices to capture these modalities, which can be costly and inconvenient. To address this limitation using only a single RGB image, we propose GDVIFNet, a novel approach that leverages Depth Anything to generate depth images. Since these generated depth images may contain noise and artifacts, we incorporate self-supervised techniques to generate edge feature information. During the process of generating image edge features, the noise and artifacts present in the generated depth images can be effectively removed. Our method employs a dual-branch architecture, combining CNN and Transformer-based branches for feature extraction. We designed the step trimodal interaction unit (STIU) to fuse the RGB features with the depth features from the CNN branch and the self-cross attention fusion (SCF) to integrate RGB features with depth features from the Transformer branch. Finally, guided by edge features from our self-supervised edge guidance module (SEGM), we employ the CNN-Edge-Transformer step fusion (CETSF) to fuse features from both branches. Experimental results demonstrate that our method achieves state-of-the-art performance across multiple datasets. Code can be found at <span><span>https://github.com/typist2001/GDVIFNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"188 ","pages":"Article 107445"},"PeriodicalIF":6.0000,"publicationDate":"2025-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"GDVIFNet: A generated depth and visible image fusion network with edge feature guidance for salient object detection\",\"authors\":\"Xiaogang Song , Yuping Tan , Xiaochang Li , Xinhong Hei\",\"doi\":\"10.1016/j.neunet.2025.107445\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>In recent years, despite significant advancements in salient object detection (SOD), performance in complex interference environments remains suboptimal. To address these challenges, additional modalities like depth (SOD-D) or thermal imaging (SOD-T) are often introduced. However, existing methods typically rely on specialized depth or thermal devices to capture these modalities, which can be costly and inconvenient. To address this limitation using only a single RGB image, we propose GDVIFNet, a novel approach that leverages Depth Anything to generate depth images. Since these generated depth images may contain noise and artifacts, we incorporate self-supervised techniques to generate edge feature information. During the process of generating image edge features, the noise and artifacts present in the generated depth images can be effectively removed. Our method employs a dual-branch architecture, combining CNN and Transformer-based branches for feature extraction. We designed the step trimodal interaction unit (STIU) to fuse the RGB features with the depth features from the CNN branch and the self-cross attention fusion (SCF) to integrate RGB features with depth features from the Transformer branch. Finally, guided by edge features from our self-supervised edge guidance module (SEGM), we employ the CNN-Edge-Transformer step fusion (CETSF) to fuse features from both branches. Experimental results demonstrate that our method achieves state-of-the-art performance across multiple datasets. Code can be found at <span><span>https://github.com/typist2001/GDVIFNet</span><svg><path></path></svg></span>.</div></div>\",\"PeriodicalId\":49763,\"journal\":{\"name\":\"Neural Networks\",\"volume\":\"188 \",\"pages\":\"Article 107445\"},\"PeriodicalIF\":6.0000,\"publicationDate\":\"2025-04-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neural Networks\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0893608025003247\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Networks","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0893608025003247","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

近年来，尽管在显著目标检测（SOD）方面取得了重大进展，但在复杂干扰环境中的性能仍然不理想。为了应对这些挑战，通常会引入深度（SOD-D）或热成像（SOD-T）等其他方法。然而，现有的方法通常依赖于专门的深度或热设备来捕获这些模式，这既昂贵又不方便。为了解决仅使用单个RGB图像的这一限制，我们提出了GDVIFNet，这是一种利用Depth Anything来生成深度图像的新方法。由于这些生成的深度图像可能包含噪声和伪影，我们采用自监督技术来生成边缘特征信息。在生成图像边缘特征的过程中，可以有效地去除所生成深度图像中的噪声和伪影。我们的方法采用双分支架构，结合CNN和基于transformer的分支进行特征提取。我们设计了阶跃三模交互单元（STIU）来融合CNN分支的RGB特征和深度特征，设计了自交叉注意融合（SCF）来融合Transformer分支的RGB特征和深度特征。最后，在自监督边缘引导模块（SEGM）的边缘特征引导下，我们采用CNN-Edge-Transformer步长融合（CETSF）来融合两个分支的特征。实验结果表明，我们的方法在多个数据集上实现了最先进的性能。代码可以在https://github.com/typist2001/GDVIFNet上找到。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

GDVIFNet: A generated depth and visible image fusion network with edge feature guidance for salient object detection

In recent years, despite significant advancements in salient object detection (SOD), performance in complex interference environments remains suboptimal. To address these challenges, additional modalities like depth (SOD-D) or thermal imaging (SOD-T) are often introduced. However, existing methods typically rely on specialized depth or thermal devices to capture these modalities, which can be costly and inconvenient. To address this limitation using only a single RGB image, we propose GDVIFNet, a novel approach that leverages Depth Anything to generate depth images. Since these generated depth images may contain noise and artifacts, we incorporate self-supervised techniques to generate edge feature information. During the process of generating image edge features, the noise and artifacts present in the generated depth images can be effectively removed. Our method employs a dual-branch architecture, combining CNN and Transformer-based branches for feature extraction. We designed the step trimodal interaction unit (STIU) to fuse the RGB features with the depth features from the CNN branch and the self-cross attention fusion (SCF) to integrate RGB features with depth features from the Transformer branch. Finally, guided by edge features from our self-supervised edge guidance module (SEGM), we employ the CNN-Edge-Transformer step fusion (CETSF) to fuse features from both branches. Experimental results demonstrate that our method achieves state-of-the-art performance across multiple datasets. Code can be found at https://github.com/typist2001/GDVIFNet.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Neural Networks 工程技术-计算机：人工智能

CiteScore

13.90

自引率

7.70%

发文量

425

审稿时长

67 days

期刊介绍： Neural Networks is a platform that aims to foster an international community of scholars and practitioners interested in neural networks, deep learning, and other approaches to artificial intelligence and machine learning. Our journal invites submissions covering various aspects of neural networks research, from computational neuroscience and cognitive modeling to mathematical analyses and engineering applications. By providing a forum for interdisciplinary discussions between biology and technology, we aim to encourage the development of biologically-inspired artificial intelligence.