Qianwen Ma , Xiaobo Li , Bincheng Li , Zhen Zhu , Jing Wu , Feng Huang , Haofeng Hu
{"title":"基于rgb偏振的水下显著目标检测的协同变压器和曼巴融合网络","authors":"Qianwen Ma , Xiaobo Li , Bincheng Li , Zhen Zhu , Jing Wu , Feng Huang , Haofeng Hu","doi":"10.1016/j.inffus.2025.103182","DOIUrl":null,"url":null,"abstract":"<div><div>The quality of underwater imaging is severely compromised due to the light scattering and absorption caused by suspended particles, limiting the effectiveness of following underwater salient object detection (USOD) tasks. Polarization information offers a unique perspective by interpreting the intrinsic physical properties of objects, potentially enhancing the contrast between objects and background in complex scenes. However, it is rarely applied in the field of USOD. In this paper, we build a dataset named TJUP-USOD, which includes both RGB and polarization (i,e., RGB-P) images; based on this, we design a USOD network, called STAMF, to explore the strengths of both color and polarization information. STAMF synthesizes these complementary information streams to generate high-contrast, vivid scene representations that improve the discernibility of underwater features. Specifically, the Omnidirectional Tokens-to-Token Vision Mamba notably amplifies the capacity to handle both global and local information by employing multidirectional scanning and iterative integration of inputs. Besides, introducing the Mamba Cross-Modal Fusion Module adeptly merges RGB and polarization features, amalgamating global insights to refine local pixel-wise fusion precision and alleviate overall misguidance resulting from the fusion of erroneous modal data in demanding underwater environments. Comparative experiments with 27 methods and extensive ablation study results demonstrate that, the proposed STAMF, with only 25.85 million parameters, effectively leverages RGB-P information, achieving state-of-the-art performance, and opens a new door for the USOD tasks. The proposed STAMF once again demonstrates the importance of increasing the dimensionality of the dataset for USOD; and further exploring the advantages of network structures based on multi-dimensional data will further enhance task performance. The code and dataset are publicly available: <span><span>https://github.com/Kingwin97/STAMF</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"122 ","pages":"Article 103182"},"PeriodicalIF":15.5000,"publicationDate":"2025-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"STAMF: Synergistic transformer and mamba fusion network for RGB-Polarization based underwater salient object detection\",\"authors\":\"Qianwen Ma , Xiaobo Li , Bincheng Li , Zhen Zhu , Jing Wu , Feng Huang , Haofeng Hu\",\"doi\":\"10.1016/j.inffus.2025.103182\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The quality of underwater imaging is severely compromised due to the light scattering and absorption caused by suspended particles, limiting the effectiveness of following underwater salient object detection (USOD) tasks. Polarization information offers a unique perspective by interpreting the intrinsic physical properties of objects, potentially enhancing the contrast between objects and background in complex scenes. However, it is rarely applied in the field of USOD. In this paper, we build a dataset named TJUP-USOD, which includes both RGB and polarization (i,e., RGB-P) images; based on this, we design a USOD network, called STAMF, to explore the strengths of both color and polarization information. STAMF synthesizes these complementary information streams to generate high-contrast, vivid scene representations that improve the discernibility of underwater features. Specifically, the Omnidirectional Tokens-to-Token Vision Mamba notably amplifies the capacity to handle both global and local information by employing multidirectional scanning and iterative integration of inputs. Besides, introducing the Mamba Cross-Modal Fusion Module adeptly merges RGB and polarization features, amalgamating global insights to refine local pixel-wise fusion precision and alleviate overall misguidance resulting from the fusion of erroneous modal data in demanding underwater environments. Comparative experiments with 27 methods and extensive ablation study results demonstrate that, the proposed STAMF, with only 25.85 million parameters, effectively leverages RGB-P information, achieving state-of-the-art performance, and opens a new door for the USOD tasks. The proposed STAMF once again demonstrates the importance of increasing the dimensionality of the dataset for USOD; and further exploring the advantages of network structures based on multi-dimensional data will further enhance task performance. The code and dataset are publicly available: <span><span>https://github.com/Kingwin97/STAMF</span><svg><path></path></svg></span>.</div></div>\",\"PeriodicalId\":50367,\"journal\":{\"name\":\"Information Fusion\",\"volume\":\"122 \",\"pages\":\"Article 103182\"},\"PeriodicalIF\":15.5000,\"publicationDate\":\"2025-04-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Fusion\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1566253525002556\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Fusion","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1566253525002556","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
STAMF: Synergistic transformer and mamba fusion network for RGB-Polarization based underwater salient object detection
The quality of underwater imaging is severely compromised due to the light scattering and absorption caused by suspended particles, limiting the effectiveness of following underwater salient object detection (USOD) tasks. Polarization information offers a unique perspective by interpreting the intrinsic physical properties of objects, potentially enhancing the contrast between objects and background in complex scenes. However, it is rarely applied in the field of USOD. In this paper, we build a dataset named TJUP-USOD, which includes both RGB and polarization (i,e., RGB-P) images; based on this, we design a USOD network, called STAMF, to explore the strengths of both color and polarization information. STAMF synthesizes these complementary information streams to generate high-contrast, vivid scene representations that improve the discernibility of underwater features. Specifically, the Omnidirectional Tokens-to-Token Vision Mamba notably amplifies the capacity to handle both global and local information by employing multidirectional scanning and iterative integration of inputs. Besides, introducing the Mamba Cross-Modal Fusion Module adeptly merges RGB and polarization features, amalgamating global insights to refine local pixel-wise fusion precision and alleviate overall misguidance resulting from the fusion of erroneous modal data in demanding underwater environments. Comparative experiments with 27 methods and extensive ablation study results demonstrate that, the proposed STAMF, with only 25.85 million parameters, effectively leverages RGB-P information, achieving state-of-the-art performance, and opens a new door for the USOD tasks. The proposed STAMF once again demonstrates the importance of increasing the dimensionality of the dataset for USOD; and further exploring the advantages of network structures based on multi-dimensional data will further enhance task performance. The code and dataset are publicly available: https://github.com/Kingwin97/STAMF.
期刊介绍:
Information Fusion serves as a central platform for showcasing advancements in multi-sensor, multi-source, multi-process information fusion, fostering collaboration among diverse disciplines driving its progress. It is the leading outlet for sharing research and development in this field, focusing on architectures, algorithms, and applications. Papers dealing with fundamental theoretical analyses as well as those demonstrating their application to real-world problems will be welcome.