Xiaogang Song , Yuping Tan , Xiaochang Li , Xinhong Hei
{"title":"GDVIFNet:一种具有边缘特征引导的用于显著目标检测的生成深度和可见图像融合网络","authors":"Xiaogang Song , Yuping Tan , Xiaochang Li , Xinhong Hei","doi":"10.1016/j.neunet.2025.107445","DOIUrl":null,"url":null,"abstract":"<div><div>In recent years, despite significant advancements in salient object detection (SOD), performance in complex interference environments remains suboptimal. To address these challenges, additional modalities like depth (SOD-D) or thermal imaging (SOD-T) are often introduced. However, existing methods typically rely on specialized depth or thermal devices to capture these modalities, which can be costly and inconvenient. To address this limitation using only a single RGB image, we propose GDVIFNet, a novel approach that leverages Depth Anything to generate depth images. Since these generated depth images may contain noise and artifacts, we incorporate self-supervised techniques to generate edge feature information. During the process of generating image edge features, the noise and artifacts present in the generated depth images can be effectively removed. Our method employs a dual-branch architecture, combining CNN and Transformer-based branches for feature extraction. We designed the step trimodal interaction unit (STIU) to fuse the RGB features with the depth features from the CNN branch and the self-cross attention fusion (SCF) to integrate RGB features with depth features from the Transformer branch. Finally, guided by edge features from our self-supervised edge guidance module (SEGM), we employ the CNN-Edge-Transformer step fusion (CETSF) to fuse features from both branches. Experimental results demonstrate that our method achieves state-of-the-art performance across multiple datasets. Code can be found at <span><span>https://github.com/typist2001/GDVIFNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"188 ","pages":"Article 107445"},"PeriodicalIF":6.0000,"publicationDate":"2025-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"GDVIFNet: A generated depth and visible image fusion network with edge feature guidance for salient object detection\",\"authors\":\"Xiaogang Song , Yuping Tan , Xiaochang Li , Xinhong Hei\",\"doi\":\"10.1016/j.neunet.2025.107445\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>In recent years, despite significant advancements in salient object detection (SOD), performance in complex interference environments remains suboptimal. To address these challenges, additional modalities like depth (SOD-D) or thermal imaging (SOD-T) are often introduced. However, existing methods typically rely on specialized depth or thermal devices to capture these modalities, which can be costly and inconvenient. To address this limitation using only a single RGB image, we propose GDVIFNet, a novel approach that leverages Depth Anything to generate depth images. Since these generated depth images may contain noise and artifacts, we incorporate self-supervised techniques to generate edge feature information. During the process of generating image edge features, the noise and artifacts present in the generated depth images can be effectively removed. Our method employs a dual-branch architecture, combining CNN and Transformer-based branches for feature extraction. We designed the step trimodal interaction unit (STIU) to fuse the RGB features with the depth features from the CNN branch and the self-cross attention fusion (SCF) to integrate RGB features with depth features from the Transformer branch. Finally, guided by edge features from our self-supervised edge guidance module (SEGM), we employ the CNN-Edge-Transformer step fusion (CETSF) to fuse features from both branches. Experimental results demonstrate that our method achieves state-of-the-art performance across multiple datasets. Code can be found at <span><span>https://github.com/typist2001/GDVIFNet</span><svg><path></path></svg></span>.</div></div>\",\"PeriodicalId\":49763,\"journal\":{\"name\":\"Neural Networks\",\"volume\":\"188 \",\"pages\":\"Article 107445\"},\"PeriodicalIF\":6.0000,\"publicationDate\":\"2025-04-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neural Networks\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0893608025003247\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Networks","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0893608025003247","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
GDVIFNet: A generated depth and visible image fusion network with edge feature guidance for salient object detection
In recent years, despite significant advancements in salient object detection (SOD), performance in complex interference environments remains suboptimal. To address these challenges, additional modalities like depth (SOD-D) or thermal imaging (SOD-T) are often introduced. However, existing methods typically rely on specialized depth or thermal devices to capture these modalities, which can be costly and inconvenient. To address this limitation using only a single RGB image, we propose GDVIFNet, a novel approach that leverages Depth Anything to generate depth images. Since these generated depth images may contain noise and artifacts, we incorporate self-supervised techniques to generate edge feature information. During the process of generating image edge features, the noise and artifacts present in the generated depth images can be effectively removed. Our method employs a dual-branch architecture, combining CNN and Transformer-based branches for feature extraction. We designed the step trimodal interaction unit (STIU) to fuse the RGB features with the depth features from the CNN branch and the self-cross attention fusion (SCF) to integrate RGB features with depth features from the Transformer branch. Finally, guided by edge features from our self-supervised edge guidance module (SEGM), we employ the CNN-Edge-Transformer step fusion (CETSF) to fuse features from both branches. Experimental results demonstrate that our method achieves state-of-the-art performance across multiple datasets. Code can be found at https://github.com/typist2001/GDVIFNet.
期刊介绍:
Neural Networks is a platform that aims to foster an international community of scholars and practitioners interested in neural networks, deep learning, and other approaches to artificial intelligence and machine learning. Our journal invites submissions covering various aspects of neural networks research, from computational neuroscience and cognitive modeling to mathematical analyses and engineering applications. By providing a forum for interdisciplinary discussions between biology and technology, we aim to encourage the development of biologically-inspired artificial intelligence.