SiaTrans:基于深度图像分类的RGB-D显著目标检测Siamese变压器网络

Comput. Vis. Image Underst. Pub Date : 2022-07-09 DOI:10.48550/arXiv.2207.04224

Xin Jia, Changlei Dongye, Yan-Tsung Peng

{"title":"SiaTrans:基于深度图像分类的RGB-D显著目标检测Siamese变压器网络","authors":"Xin Jia, Changlei Dongye, Yan-Tsung Peng","doi":"10.48550/arXiv.2207.04224","DOIUrl":null,"url":null,"abstract":"RGB-D SOD uses depth information to handle challenging scenes and obtain high-quality saliency maps. Existing state-of-the-art RGB-D saliency detection methods overwhelmingly rely on the strategy of directly fusing depth information. Although these methods improve the accuracy of saliency prediction through various cross-modality fusion strategies, misinformation provided by some poor-quality depth images can affect the saliency prediction result. To address this issue, a novel RGB-D salient object detection model (SiaTrans) is proposed in this paper, which allows training on depth image quality classification at the same time as training on SOD. In light of the common information between RGB and depth images on salient objects, SiaTrans uses a Siamese transformer network with shared weight parameters as the encoder and extracts RGB and depth features concatenated on the batch dimension, saving space resources without compromising performance. SiaTrans uses the Class token in the backbone network (T2T-ViT) to classify the quality of depth images without preventing the token sequence from going on with the saliency detection task. Transformer-based cross-modality fusion module (CMF) can effectively fuse RGB and depth information. And in the testing process, CMF can choose to fuse cross-modality information or enhance RGB information according to the quality classification signal of the depth image. The greatest benefit of our designed CMF and decoder is that they maintain the consistency of RGB and RGB-D information decoding: SiaTrans decodes RGB-D or RGB information under the same model parameters according to the classification signal during testing. Comprehensive experiments on nine RGB-D SOD benchmark datasets show that SiaTrans has the best overall performance and the least computation compared with recent state-of-the-art methods.","PeriodicalId":10549,"journal":{"name":"Comput. Vis. Image Underst.","volume":"190 1","pages":"104549"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"SiaTrans: Siamese Transformer Network for RGB-D Salient Object Detection with Depth Image Classification\",\"authors\":\"Xin Jia, Changlei Dongye, Yan-Tsung Peng\",\"doi\":\"10.48550/arXiv.2207.04224\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"RGB-D SOD uses depth information to handle challenging scenes and obtain high-quality saliency maps. Existing state-of-the-art RGB-D saliency detection methods overwhelmingly rely on the strategy of directly fusing depth information. Although these methods improve the accuracy of saliency prediction through various cross-modality fusion strategies, misinformation provided by some poor-quality depth images can affect the saliency prediction result. To address this issue, a novel RGB-D salient object detection model (SiaTrans) is proposed in this paper, which allows training on depth image quality classification at the same time as training on SOD. In light of the common information between RGB and depth images on salient objects, SiaTrans uses a Siamese transformer network with shared weight parameters as the encoder and extracts RGB and depth features concatenated on the batch dimension, saving space resources without compromising performance. SiaTrans uses the Class token in the backbone network (T2T-ViT) to classify the quality of depth images without preventing the token sequence from going on with the saliency detection task. Transformer-based cross-modality fusion module (CMF) can effectively fuse RGB and depth information. And in the testing process, CMF can choose to fuse cross-modality information or enhance RGB information according to the quality classification signal of the depth image. The greatest benefit of our designed CMF and decoder is that they maintain the consistency of RGB and RGB-D information decoding: SiaTrans decodes RGB-D or RGB information under the same model parameters according to the classification signal during testing. Comprehensive experiments on nine RGB-D SOD benchmark datasets show that SiaTrans has the best overall performance and the least computation compared with recent state-of-the-art methods.\",\"PeriodicalId\":10549,\"journal\":{\"name\":\"Comput. Vis. Image Underst.\",\"volume\":\"190 1\",\"pages\":\"104549\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-07-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Comput. Vis. Image Underst.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.48550/arXiv.2207.04224\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Comput. Vis. Image Underst.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2207.04224","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

摘要

RGB-D SOD使用深度信息来处理具有挑战性的场景，并获得高质量的显著性地图。现有最先进的RGB-D显著性检测方法绝大多数依赖于直接融合深度信息的策略。虽然这些方法通过各种跨模态融合策略提高了显著性预测的准确性，但一些质量较差的深度图像提供的错误信息会影响显著性预测的结果。为了解决这一问题，本文提出了一种新的RGB-D显著目标检测模型(siatranss)，该模型可以在进行SOD训练的同时进行深度图像质量分类训练。考虑到显著目标上RGB和深度图像之间的共同信息，SiaTrans采用具有共享权值参数的Siamese变压器网络作为编码器，在不影响性能的前提下，在批量维度上提取连接在一起的RGB和深度特征，节省空间资源。siatran使用骨干网(T2T-ViT)中的Class令牌对深度图像的质量进行分类，而不会阻止令牌序列继续进行显著性检测任务。基于变压器的跨模态融合模块(CMF)可以有效地融合RGB和深度信息。在测试过程中，CMF可以根据深度图像的质量分类信号选择融合交叉模态信息或增强RGB信息。我们设计的CMF和解码器最大的好处是保持了RGB和RGB- d信息解码的一致性:SiaTrans在测试时根据分类信号对相同模型参数下的RGB- d或RGB信息进行解码。在9个RGB-D SOD基准数据集上的综合实验表明，与目前最先进的方法相比，SiaTrans具有最佳的综合性能和最少的计算量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

SiaTrans: Siamese Transformer Network for RGB-D Salient Object Detection with Depth Image Classification

RGB-D SOD uses depth information to handle challenging scenes and obtain high-quality saliency maps. Existing state-of-the-art RGB-D saliency detection methods overwhelmingly rely on the strategy of directly fusing depth information. Although these methods improve the accuracy of saliency prediction through various cross-modality fusion strategies, misinformation provided by some poor-quality depth images can affect the saliency prediction result. To address this issue, a novel RGB-D salient object detection model (SiaTrans) is proposed in this paper, which allows training on depth image quality classification at the same time as training on SOD. In light of the common information between RGB and depth images on salient objects, SiaTrans uses a Siamese transformer network with shared weight parameters as the encoder and extracts RGB and depth features concatenated on the batch dimension, saving space resources without compromising performance. SiaTrans uses the Class token in the backbone network (T2T-ViT) to classify the quality of depth images without preventing the token sequence from going on with the saliency detection task. Transformer-based cross-modality fusion module (CMF) can effectively fuse RGB and depth information. And in the testing process, CMF can choose to fuse cross-modality information or enhance RGB information according to the quality classification signal of the depth image. The greatest benefit of our designed CMF and decoder is that they maintain the consistency of RGB and RGB-D information decoding: SiaTrans decodes RGB-D or RGB information under the same model parameters according to the classification signal during testing. Comprehensive experiments on nine RGB-D SOD benchmark datasets show that SiaTrans has the best overall performance and the least computation compared with recent state-of-the-art methods.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Comput. Vis. Image Underst.

自引率

0.00%

发文量