GroupTransNet: Group Transformer Network for RGB-D Salient Object Detection

ArXiv Pub Date : 2022-03-21 DOI:10.48550/arXiv.2203.10785

Xian Fang, Jin-lei Zhu, Xiuli Shao, Hongpeng Wang

{"title":"GroupTransNet: Group Transformer Network for RGB-D Salient Object Detection","authors":"Xian Fang, Jin-lei Zhu, Xiuli Shao, Hongpeng Wang","doi":"10.48550/arXiv.2203.10785","DOIUrl":null,"url":null,"abstract":"Salient object detection on RGB-D images is an active topic in computer vision. Although the existing methods have achieved appreciable performance, there are still some challenges. The locality of convolutional neural network requires that the model has a sufficiently deep global receptive field, which always leads to the loss of local details. To address the challenge, we propose a novel Group Transformer Network (GroupTransNet) for RGB-D salient object detection. This method is good at learning the long-range dependencies of cross layer features to promote more perfect feature expression. At the beginning, the features of the slightly higher classes of the middle three levels and the latter three levels are soft grouped to absorb the advantages of the high-level features. The input features are repeatedly purified and enhanced by the attention mechanism to purify the cross modal features of color modal and depth modal. The features of the intermediate process are first fused by the features of different layers, and then processed by several transformers in multiple groups, which not only makes the size of the features of each scale unified and interrelated, but also achieves the effect of sharing the weight of the features within the group. The output features in different groups complete the clustering staggered by two owing to the level difference, and combine with the low-level features. Extensive experiments demonstrate that GroupTransNet outperforms the comparison models and achieves the new state-of-the-art performance.","PeriodicalId":93888,"journal":{"name":"ArXiv","volume":"126 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ArXiv","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2203.10785","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

Salient object detection on RGB-D images is an active topic in computer vision. Although the existing methods have achieved appreciable performance, there are still some challenges. The locality of convolutional neural network requires that the model has a sufficiently deep global receptive field, which always leads to the loss of local details. To address the challenge, we propose a novel Group Transformer Network (GroupTransNet) for RGB-D salient object detection. This method is good at learning the long-range dependencies of cross layer features to promote more perfect feature expression. At the beginning, the features of the slightly higher classes of the middle three levels and the latter three levels are soft grouped to absorb the advantages of the high-level features. The input features are repeatedly purified and enhanced by the attention mechanism to purify the cross modal features of color modal and depth modal. The features of the intermediate process are first fused by the features of different layers, and then processed by several transformers in multiple groups, which not only makes the size of the features of each scale unified and interrelated, but also achieves the effect of sharing the weight of the features within the group. The output features in different groups complete the clustering staggered by two owing to the level difference, and combine with the low-level features. Extensive experiments demonstrate that GroupTransNet outperforms the comparison models and achieves the new state-of-the-art performance.

查看原文本刊更多论文

GroupTransNet:用于RGB-D显著目标检测的组变压器网络

RGB-D图像的显著目标检测是计算机视觉领域的研究热点。虽然现有的方法取得了可观的效果，但仍存在一些挑战。卷积神经网络的局部性要求模型具有足够深的全局感受野，而这往往会导致局部细节的丢失。为了解决这一挑战，我们提出了一种用于RGB-D显著目标检测的新型组变压器网络(GroupTransNet)。该方法善于学习跨层特征之间的长期依赖关系，促进更完善的特征表达。首先将中三级和后三级的稍高类特征进行软分组，吸收高阶特征的优点。通过注意机制对输入特征进行反复净化和增强，以净化颜色模态和深度模态的交叉模态特征。中间过程的特征首先由不同层的特征融合，然后由多组若干变压器进行处理，不仅使各尺度特征的大小统一且相互关联，而且达到了组内特征权重共享的效果。不同组的输出特征由于等级差异而错开两个完成聚类，并与底层特征结合。大量的实验表明，GroupTransNet优于比较模型，达到了新的最先进的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ArXiv

自引率

0.00%

发文量