RGB-D Saliency Detection with 3D Cross-modal Fusion and Mid-level Integration

2022 IEEE 34th International Conference on Tools with Artificial Intelligence (ICTAI) Pub Date : 2022-10-01 DOI:10.1109/ICTAI56018.2022.00201

Tao Liu, Bo Li

{"title":"RGB-D Saliency Detection with 3D Cross-modal Fusion and Mid-level Integration","authors":"Tao Liu, Bo Li","doi":"10.1109/ICTAI56018.2022.00201","DOIUrl":null,"url":null,"abstract":"In recent years, many salient object detection (SOD) methods introduce depth cues to boost detection performance in challenging scenes, named as RGB-D SOD. However, how to effectively fuse cross-modal features with various properties (i.e., RGB and depth) has become a key issue that is hard to be avoided. Most existing methods employ simple operations, such as concatenation or summation, for cross-modal fusion, ignoring the negative effects of low-quality depth maps, thus yielding poor performance. In this paper, we design a simple yet effective fusion method, which utilizes 3D convolution to extract modality-specific and modality-shared information respectively for sufficient cross-modal fusion, and combines modality weights to mitigate the interference of invalid information. In addition, we propose a novel multi-level feature integration strategy in the decoder, which explicitly incorporates the low-level detail information and high-level semantic information into the mid-level to generate accurate saliency maps. Extensive experiments on six public datasets show that our method achieves competitive results compared to 17 state-of-the-art methods.","PeriodicalId":354314,"journal":{"name":"2022 IEEE 34th International Conference on Tools with Artificial Intelligence (ICTAI)","volume":"115 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 34th International Conference on Tools with Artificial Intelligence (ICTAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICTAI56018.2022.00201","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

In recent years, many salient object detection (SOD) methods introduce depth cues to boost detection performance in challenging scenes, named as RGB-D SOD. However, how to effectively fuse cross-modal features with various properties (i.e., RGB and depth) has become a key issue that is hard to be avoided. Most existing methods employ simple operations, such as concatenation or summation, for cross-modal fusion, ignoring the negative effects of low-quality depth maps, thus yielding poor performance. In this paper, we design a simple yet effective fusion method, which utilizes 3D convolution to extract modality-specific and modality-shared information respectively for sufficient cross-modal fusion, and combines modality weights to mitigate the interference of invalid information. In addition, we propose a novel multi-level feature integration strategy in the decoder, which explicitly incorporates the low-level detail information and high-level semantic information into the mid-level to generate accurate saliency maps. Extensive experiments on six public datasets show that our method achieves competitive results compared to 17 state-of-the-art methods.

查看原文本刊更多论文

RGB-D显著性检测与三维跨模态融合和中层融合

近年来，许多显著目标检测(SOD)方法引入深度线索来提高具有挑战性场景下的检测性能，这些方法被称为RGB-D SOD。然而，如何有效地融合跨模态特征与各种属性(即RGB和depth)成为一个难以回避的关键问题。对于跨模态融合，大多数现有方法采用简单的操作，如串联或求和，忽略了低质量深度图的负面影响，从而产生较差的性能。本文设计了一种简单而有效的融合方法，利用三维卷积分别提取模态特定信息和模态共享信息以实现充分的跨模态融合，并结合模态权重以减轻无效信息的干扰。此外，我们在解码器中提出了一种新的多层次特征集成策略，该策略将低级细节信息和高级语义信息明确地融合到中级，以生成精确的显著性图。在六个公共数据集上进行的大量实验表明，与17种最先进的方法相比，我们的方法取得了具有竞争力的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 IEEE 34th International Conference on Tools with Artificial Intelligence (ICTAI)

自引率

0.00%

发文量