IF-USOD: Multimodal information fusion interactive feature enhancement architecture for underwater salient object detection

IF 15.5 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Information Fusion Pub Date : 2024-11-23 DOI:10.1016/j.inffus.2024.102806

Genji Yuan , Jintao Song , Jinjiang Li

{"title":"IF-USOD: Multimodal information fusion interactive feature enhancement architecture for underwater salient object detection","authors":"Genji Yuan , Jintao Song , Jinjiang Li","doi":"10.1016/j.inffus.2024.102806","DOIUrl":null,"url":null,"abstract":"<div><div>Underwater salient object detection (USOD) has garnered increasing attention due to its superior performance in various underwater visual tasks. Despite the growing interest, research on USOD remains in its nascent stages, with existing methods often struggling to capture long-range contextual features of salient objects. Additionally, these methods frequently overlook the complementary nature of multimodal information. The multimodal information fusion can render previously indiscernible objects more detectable, as capturing complementary features from diverse source images enables a more accurate depiction of objects. In this work, we explore an innovative approach that integrates RGB and depth information, coupled with interactive feature enhancement, to advance the detection of underwater salient objects. Our method first leverages the strengths of both transformer and convolutional neural network architectures to extract features from source images. Here, we employ a two-stage training strategy designed to optimize feature fusion. Subsequently, we utilize self-attention and cross-attention mechanisms to model the correlations among the extracted features, thereby amplifying the relevant features. Finally, to fully exploit features across different network layers, we introduce a cross-scale learning strategy to facilitate multi-scale feature fusion, which improves the detection accuracy of underwater salient objects by generating both coarse and fine salient predictions. Extensive experimental evaluations demonstrate the state-of-the-art model performance of our proposed method.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"117 ","pages":"Article 102806"},"PeriodicalIF":15.5000,"publicationDate":"2024-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Fusion","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1566253524005840","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Underwater salient object detection (USOD) has garnered increasing attention due to its superior performance in various underwater visual tasks. Despite the growing interest, research on USOD remains in its nascent stages, with existing methods often struggling to capture long-range contextual features of salient objects. Additionally, these methods frequently overlook the complementary nature of multimodal information. The multimodal information fusion can render previously indiscernible objects more detectable, as capturing complementary features from diverse source images enables a more accurate depiction of objects. In this work, we explore an innovative approach that integrates RGB and depth information, coupled with interactive feature enhancement, to advance the detection of underwater salient objects. Our method first leverages the strengths of both transformer and convolutional neural network architectures to extract features from source images. Here, we employ a two-stage training strategy designed to optimize feature fusion. Subsequently, we utilize self-attention and cross-attention mechanisms to model the correlations among the extracted features, thereby amplifying the relevant features. Finally, to fully exploit features across different network layers, we introduce a cross-scale learning strategy to facilitate multi-scale feature fusion, which improves the detection accuracy of underwater salient objects by generating both coarse and fine salient predictions. Extensive experimental evaluations demonstrate the state-of-the-art model performance of our proposed method.

查看原文本刊更多论文

IF-USOD：用于水下突出物体探测的多模态信息融合交互式特征增强架构

水下突出物体检测（USOD）因其在各种水下视觉任务中的卓越表现而受到越来越多的关注。尽管人们的兴趣与日俱增，但有关水下突出物体检测的研究仍处于起步阶段，现有方法往往难以捕捉突出物体的长距离上下文特征。此外，这些方法经常忽略多模态信息的互补性。多模态信息融合可以使以前无法识别的物体更容易被检测到，因为从不同的源图像中捕捉互补特征可以更准确地描述物体。在这项工作中，我们探索了一种创新方法，将 RGB 和深度信息与交互式特征增强相结合，以推进水下突出物体的检测。我们的方法首先利用变压器和卷积神经网络架构的优势，从源图像中提取特征。在此，我们采用了两阶段训练策略，旨在优化特征融合。随后，我们利用自注意和交叉注意机制对提取的特征之间的相关性进行建模，从而放大相关特征。最后，为了充分利用不同网络层的特征，我们引入了跨尺度学习策略，以促进多尺度特征融合，通过生成粗略和精细的突出预测，提高水下突出物体的检测精度。广泛的实验评估证明了我们提出的方法具有最先进的模型性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Information Fusion 工程技术-计算机：理论方法

CiteScore

33.20

自引率

4.30%

发文量

161

审稿时长

7.9 months

期刊介绍： Information Fusion serves as a central platform for showcasing advancements in multi-sensor, multi-source, multi-process information fusion, fostering collaboration among diverse disciplines driving its progress. It is the leading outlet for sharing research and development in this field, focusing on architectures, algorithms, and applications. Papers dealing with fundamental theoretical analyses as well as those demonstrating their application to real-world problems will be welcome.