STAMF: Synergistic transformer and mamba fusion network for RGB-Polarization based underwater salient object detection

IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Qianwen Ma , Xiaobo Li , Bincheng Li , Zhen Zhu , Jing Wu , Feng Huang , Haofeng Hu
{"title":"STAMF: Synergistic transformer and mamba fusion network for RGB-Polarization based underwater salient object detection","authors":"Qianwen Ma ,&nbsp;Xiaobo Li ,&nbsp;Bincheng Li ,&nbsp;Zhen Zhu ,&nbsp;Jing Wu ,&nbsp;Feng Huang ,&nbsp;Haofeng Hu","doi":"10.1016/j.inffus.2025.103182","DOIUrl":null,"url":null,"abstract":"<div><div>The quality of underwater imaging is severely compromised due to the light scattering and absorption caused by suspended particles, limiting the effectiveness of following underwater salient object detection (USOD) tasks. Polarization information offers a unique perspective by interpreting the intrinsic physical properties of objects, potentially enhancing the contrast between objects and background in complex scenes. However, it is rarely applied in the field of USOD. In this paper, we build a dataset named TJUP-USOD, which includes both RGB and polarization (i,e., RGB-P) images; based on this, we design a USOD network, called STAMF, to explore the strengths of both color and polarization information. STAMF synthesizes these complementary information streams to generate high-contrast, vivid scene representations that improve the discernibility of underwater features. Specifically, the Omnidirectional Tokens-to-Token Vision Mamba notably amplifies the capacity to handle both global and local information by employing multidirectional scanning and iterative integration of inputs. Besides, introducing the Mamba Cross-Modal Fusion Module adeptly merges RGB and polarization features, amalgamating global insights to refine local pixel-wise fusion precision and alleviate overall misguidance resulting from the fusion of erroneous modal data in demanding underwater environments. Comparative experiments with 27 methods and extensive ablation study results demonstrate that, the proposed STAMF, with only 25.85 million parameters, effectively leverages RGB-P information, achieving state-of-the-art performance, and opens a new door for the USOD tasks. The proposed STAMF once again demonstrates the importance of increasing the dimensionality of the dataset for USOD; and further exploring the advantages of network structures based on multi-dimensional data will further enhance task performance. The code and dataset are publicly available: <span><span>https://github.com/Kingwin97/STAMF</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"122 ","pages":"Article 103182"},"PeriodicalIF":15.5000,"publicationDate":"2025-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Fusion","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1566253525002556","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

The quality of underwater imaging is severely compromised due to the light scattering and absorption caused by suspended particles, limiting the effectiveness of following underwater salient object detection (USOD) tasks. Polarization information offers a unique perspective by interpreting the intrinsic physical properties of objects, potentially enhancing the contrast between objects and background in complex scenes. However, it is rarely applied in the field of USOD. In this paper, we build a dataset named TJUP-USOD, which includes both RGB and polarization (i,e., RGB-P) images; based on this, we design a USOD network, called STAMF, to explore the strengths of both color and polarization information. STAMF synthesizes these complementary information streams to generate high-contrast, vivid scene representations that improve the discernibility of underwater features. Specifically, the Omnidirectional Tokens-to-Token Vision Mamba notably amplifies the capacity to handle both global and local information by employing multidirectional scanning and iterative integration of inputs. Besides, introducing the Mamba Cross-Modal Fusion Module adeptly merges RGB and polarization features, amalgamating global insights to refine local pixel-wise fusion precision and alleviate overall misguidance resulting from the fusion of erroneous modal data in demanding underwater environments. Comparative experiments with 27 methods and extensive ablation study results demonstrate that, the proposed STAMF, with only 25.85 million parameters, effectively leverages RGB-P information, achieving state-of-the-art performance, and opens a new door for the USOD tasks. The proposed STAMF once again demonstrates the importance of increasing the dimensionality of the dataset for USOD; and further exploring the advantages of network structures based on multi-dimensional data will further enhance task performance. The code and dataset are publicly available: https://github.com/Kingwin97/STAMF.
基于rgb偏振的水下显著目标检测的协同变压器和曼巴融合网络
由于悬浮粒子引起的光散射和吸收,严重影响了水下成像的质量,限制了后续水下显著目标检测(USOD)任务的有效性。偏振信息通过解释物体的内在物理特性提供了一个独特的视角,有可能增强复杂场景中物体和背景之间的对比度。然而,它在USOD领域的应用很少。在本文中,我们建立了一个名为TJUP-USOD的数据集,该数据集包括RGB和偏振(i,e)。(RGB-P)图像;在此基础上,我们设计了一个USOD网络,称为STAMF,以探索颜色和偏振信息的优势。STAMF综合了这些互补的信息流,生成高对比度、生动的场景表示,提高了水下特征的可识别性。具体来说,全方位token -to- token视觉曼巴通过采用多向扫描和输入的迭代集成,显着增强了处理全局和本地信息的能力。此外,引入Mamba跨模态融合模块,巧妙地融合了RGB和极化特征,融合了全局洞察力,以提高局部像素级融合精度,并减轻了在苛刻的水下环境中由于融合错误模态数据而导致的整体误导。与27种方法的对比实验和广泛的烧蚀研究结果表明,所提出的STAMF仅使用2585万个参数,有效地利用了RGB-P信息,达到了最先进的性能,为USOD任务打开了新的大门。提出的STAMF再次证明了增加数据集维数对USOD的重要性;进一步挖掘基于多维数据的网络结构的优势,将进一步提高任务性能。代码和数据集是公开的:https://github.com/Kingwin97/STAMF。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Information Fusion
Information Fusion 工程技术-计算机:理论方法
CiteScore
33.20
自引率
4.30%
发文量
161
审稿时长
7.9 months
期刊介绍: Information Fusion serves as a central platform for showcasing advancements in multi-sensor, multi-source, multi-process information fusion, fostering collaboration among diverse disciplines driving its progress. It is the leading outlet for sharing research and development in this field, focusing on architectures, algorithms, and applications. Papers dealing with fundamental theoretical analyses as well as those demonstrating their application to real-world problems will be welcome.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信