Heterogeneous Experts and Hierarchical Perception for Underwater Salient Object Detection

IF 13.7

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society Pub Date : 2025-06-02 DOI:10.1109/TIP.2025.3572760

Mingfeng Zha;Guoqing Wang;Yunqiang Pei;Tianyu Li;Xiongxin Tang;Chongyi Li;Yang Yang;Heng Tao Shen

{"title":"Heterogeneous Experts and Hierarchical Perception for Underwater Salient Object Detection","authors":"Mingfeng Zha;Guoqing Wang;Yunqiang Pei;Tianyu Li;Xiongxin Tang;Chongyi Li;Yang Yang;Heng Tao Shen","doi":"10.1109/TIP.2025.3572760","DOIUrl":null,"url":null,"abstract":"Existing underwater salient object detection (USOD) methods design fusion strategies to integrate multimodal information, but lack exploration of modal characteristics. To address this, we separately leverage the RGB and depth branches to learn disentangled representations, formulating the heterogeneous experts and hierarchical perception network (HEHP). Specifically, to reduce modal discrepancies, we propose the hierarchical prototype guided interaction (HPI), which achieves fine-grained alignment guided by the semantic prototypes, and then refines with complementary modalities. We further design the mixture of frequency experts (MoFE), where experts focus on modeling high- and low-frequency respectively, collaborating to explicitly obtain hierarchical representations. To efficiently integrate diverse spatial and frequency information, we formulate the four-way fusion experts (FFE), which dynamically selects optimal experts for fusion while being sensitive to scale and orientation. Since depth maps with poor quality inevitably introduce noises, we design the uncertainty injection (UI) to explore high uncertainty regions by establishing pixel-level probability distributions. We further formulate the holistic prototype contrastive (HPC) loss based on semantics and patches to learn compact and general representations across modalities and images. Finally, we employ varying supervision based on branch distinctions to implicitly construct difference modeling. Extensive experiments on two USOD datasets and four relevant underwater scene benchmarks validate the effect of the proposed method, surpassing state-of-the-art binary detection models. Impressive results on seven natural scene benchmarks further demonstrate the scalability.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"3703-3717"},"PeriodicalIF":13.7000,"publicationDate":"2025-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/11018233/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Existing underwater salient object detection (USOD) methods design fusion strategies to integrate multimodal information, but lack exploration of modal characteristics. To address this, we separately leverage the RGB and depth branches to learn disentangled representations, formulating the heterogeneous experts and hierarchical perception network (HEHP). Specifically, to reduce modal discrepancies, we propose the hierarchical prototype guided interaction (HPI), which achieves fine-grained alignment guided by the semantic prototypes, and then refines with complementary modalities. We further design the mixture of frequency experts (MoFE), where experts focus on modeling high- and low-frequency respectively, collaborating to explicitly obtain hierarchical representations. To efficiently integrate diverse spatial and frequency information, we formulate the four-way fusion experts (FFE), which dynamically selects optimal experts for fusion while being sensitive to scale and orientation. Since depth maps with poor quality inevitably introduce noises, we design the uncertainty injection (UI) to explore high uncertainty regions by establishing pixel-level probability distributions. We further formulate the holistic prototype contrastive (HPC) loss based on semantics and patches to learn compact and general representations across modalities and images. Finally, we employ varying supervision based on branch distinctions to implicitly construct difference modeling. Extensive experiments on two USOD datasets and four relevant underwater scene benchmarks validate the effect of the proposed method, surpassing state-of-the-art binary detection models. Impressive results on seven natural scene benchmarks further demonstrate the scalability.

查看原文本刊更多论文

基于异构专家和层次感知的水下显著目标检测

现有的水下显著目标检测方法设计了融合多模态信息的融合策略，但缺乏对模态特征的探索。为了解决这个问题，我们分别利用RGB和深度分支来学习解纠缠表示，制定了异构专家和分层感知网络（HEHP）。具体来说，为了减少模态差异，我们提出了层次原型引导交互（HPI），它在语义原型的引导下实现细粒度对齐，然后使用互补模态进行细化。我们进一步设计了混合频率专家（MoFE），其中专家分别专注于高频和低频建模，协作以显式获得分层表示。为了有效地整合不同的空间和频率信息，我们建立了四向融合专家（FFE）模型，该模型动态地选择最优的融合专家，同时对尺度和方向敏感。由于质量差的深度图不可避免地会引入噪声，我们设计了不确定性注入（UI），通过建立像素级概率分布来探索高不确定性区域。我们进一步制定了基于语义和补丁的整体原型对比（HPC）损失，以学习跨模态和图像的紧凑和一般表示。最后，我们采用基于分支区别的变化监督来隐式构建差异模型。在两个USOD数据集和四个相关水下场景基准上进行的大量实验验证了所提出方法的效果，优于最先进的二值检测模型。在七个自然场景基准测试中令人印象深刻的结果进一步证明了可扩展性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society

自引率

0.00%

发文量