IF-USOD: Multimodal information fusion interactive feature enhancement architecture for underwater salient object detection

IF 14.7 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Genji Yuan , Jintao Song , Jinjiang Li
{"title":"IF-USOD: Multimodal information fusion interactive feature enhancement architecture for underwater salient object detection","authors":"Genji Yuan ,&nbsp;Jintao Song ,&nbsp;Jinjiang Li","doi":"10.1016/j.inffus.2024.102806","DOIUrl":null,"url":null,"abstract":"<div><div>Underwater salient object detection (USOD) has garnered increasing attention due to its superior performance in various underwater visual tasks. Despite the growing interest, research on USOD remains in its nascent stages, with existing methods often struggling to capture long-range contextual features of salient objects. Additionally, these methods frequently overlook the complementary nature of multimodal information. The multimodal information fusion can render previously indiscernible objects more detectable, as capturing complementary features from diverse source images enables a more accurate depiction of objects. In this work, we explore an innovative approach that integrates RGB and depth information, coupled with interactive feature enhancement, to advance the detection of underwater salient objects. Our method first leverages the strengths of both transformer and convolutional neural network architectures to extract features from source images. Here, we employ a two-stage training strategy designed to optimize feature fusion. Subsequently, we utilize self-attention and cross-attention mechanisms to model the correlations among the extracted features, thereby amplifying the relevant features. Finally, to fully exploit features across different network layers, we introduce a cross-scale learning strategy to facilitate multi-scale feature fusion, which improves the detection accuracy of underwater salient objects by generating both coarse and fine salient predictions. Extensive experimental evaluations demonstrate the state-of-the-art model performance of our proposed method.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"117 ","pages":"Article 102806"},"PeriodicalIF":14.7000,"publicationDate":"2024-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Fusion","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1566253524005840","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Underwater salient object detection (USOD) has garnered increasing attention due to its superior performance in various underwater visual tasks. Despite the growing interest, research on USOD remains in its nascent stages, with existing methods often struggling to capture long-range contextual features of salient objects. Additionally, these methods frequently overlook the complementary nature of multimodal information. The multimodal information fusion can render previously indiscernible objects more detectable, as capturing complementary features from diverse source images enables a more accurate depiction of objects. In this work, we explore an innovative approach that integrates RGB and depth information, coupled with interactive feature enhancement, to advance the detection of underwater salient objects. Our method first leverages the strengths of both transformer and convolutional neural network architectures to extract features from source images. Here, we employ a two-stage training strategy designed to optimize feature fusion. Subsequently, we utilize self-attention and cross-attention mechanisms to model the correlations among the extracted features, thereby amplifying the relevant features. Finally, to fully exploit features across different network layers, we introduce a cross-scale learning strategy to facilitate multi-scale feature fusion, which improves the detection accuracy of underwater salient objects by generating both coarse and fine salient predictions. Extensive experimental evaluations demonstrate the state-of-the-art model performance of our proposed method.
IF-USOD:用于水下突出物体探测的多模态信息融合交互式特征增强架构
水下突出物体检测(USOD)因其在各种水下视觉任务中的卓越表现而受到越来越多的关注。尽管人们的兴趣与日俱增,但有关水下突出物体检测的研究仍处于起步阶段,现有方法往往难以捕捉突出物体的长距离上下文特征。此外,这些方法经常忽略多模态信息的互补性。多模态信息融合可以使以前无法识别的物体更容易被检测到,因为从不同的源图像中捕捉互补特征可以更准确地描述物体。在这项工作中,我们探索了一种创新方法,将 RGB 和深度信息与交互式特征增强相结合,以推进水下突出物体的检测。我们的方法首先利用变压器和卷积神经网络架构的优势,从源图像中提取特征。在此,我们采用了两阶段训练策略,旨在优化特征融合。随后,我们利用自注意和交叉注意机制对提取的特征之间的相关性进行建模,从而放大相关特征。最后,为了充分利用不同网络层的特征,我们引入了跨尺度学习策略,以促进多尺度特征融合,通过生成粗略和精细的突出预测,提高水下突出物体的检测精度。广泛的实验评估证明了我们提出的方法具有最先进的模型性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Information Fusion
Information Fusion 工程技术-计算机:理论方法
CiteScore
33.20
自引率
4.30%
发文量
161
审稿时长
7.9 months
期刊介绍: Information Fusion serves as a central platform for showcasing advancements in multi-sensor, multi-source, multi-process information fusion, fostering collaboration among diverse disciplines driving its progress. It is the leading outlet for sharing research and development in this field, focusing on architectures, algorithms, and applications. Papers dealing with fundamental theoretical analyses as well as those demonstrating their application to real-world problems will be welcome.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信