GroupTransNet: Group Transformer Network for RGB-D Salient Object Detection

ArXiv Pub Date : 2022-03-21 DOI:10.48550/arXiv.2203.10785
Xian Fang, Jin-lei Zhu, Xiuli Shao, Hongpeng Wang
{"title":"GroupTransNet: Group Transformer Network for RGB-D Salient Object Detection","authors":"Xian Fang, Jin-lei Zhu, Xiuli Shao, Hongpeng Wang","doi":"10.48550/arXiv.2203.10785","DOIUrl":null,"url":null,"abstract":"Salient object detection on RGB-D images is an active topic in computer vision. Although the existing methods have achieved appreciable performance, there are still some challenges. The locality of convolutional neural network requires that the model has a sufficiently deep global receptive field, which always leads to the loss of local details. To address the challenge, we propose a novel Group Transformer Network (GroupTransNet) for RGB-D salient object detection. This method is good at learning the long-range dependencies of cross layer features to promote more perfect feature expression. At the beginning, the features of the slightly higher classes of the middle three levels and the latter three levels are soft grouped to absorb the advantages of the high-level features. The input features are repeatedly purified and enhanced by the attention mechanism to purify the cross modal features of color modal and depth modal. The features of the intermediate process are first fused by the features of different layers, and then processed by several transformers in multiple groups, which not only makes the size of the features of each scale unified and interrelated, but also achieves the effect of sharing the weight of the features within the group. The output features in different groups complete the clustering staggered by two owing to the level difference, and combine with the low-level features. Extensive experiments demonstrate that GroupTransNet outperforms the comparison models and achieves the new state-of-the-art performance.","PeriodicalId":93888,"journal":{"name":"ArXiv","volume":"126 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ArXiv","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2203.10785","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

Salient object detection on RGB-D images is an active topic in computer vision. Although the existing methods have achieved appreciable performance, there are still some challenges. The locality of convolutional neural network requires that the model has a sufficiently deep global receptive field, which always leads to the loss of local details. To address the challenge, we propose a novel Group Transformer Network (GroupTransNet) for RGB-D salient object detection. This method is good at learning the long-range dependencies of cross layer features to promote more perfect feature expression. At the beginning, the features of the slightly higher classes of the middle three levels and the latter three levels are soft grouped to absorb the advantages of the high-level features. The input features are repeatedly purified and enhanced by the attention mechanism to purify the cross modal features of color modal and depth modal. The features of the intermediate process are first fused by the features of different layers, and then processed by several transformers in multiple groups, which not only makes the size of the features of each scale unified and interrelated, but also achieves the effect of sharing the weight of the features within the group. The output features in different groups complete the clustering staggered by two owing to the level difference, and combine with the low-level features. Extensive experiments demonstrate that GroupTransNet outperforms the comparison models and achieves the new state-of-the-art performance.
GroupTransNet:用于RGB-D显著目标检测的组变压器网络
RGB-D图像的显著目标检测是计算机视觉领域的研究热点。虽然现有的方法取得了可观的效果,但仍存在一些挑战。卷积神经网络的局部性要求模型具有足够深的全局感受野,而这往往会导致局部细节的丢失。为了解决这一挑战,我们提出了一种用于RGB-D显著目标检测的新型组变压器网络(GroupTransNet)。该方法善于学习跨层特征之间的长期依赖关系,促进更完善的特征表达。首先将中三级和后三级的稍高类特征进行软分组,吸收高阶特征的优点。通过注意机制对输入特征进行反复净化和增强,以净化颜色模态和深度模态的交叉模态特征。中间过程的特征首先由不同层的特征融合,然后由多组若干变压器进行处理,不仅使各尺度特征的大小统一且相互关联,而且达到了组内特征权重共享的效果。不同组的输出特征由于等级差异而错开两个完成聚类,并与底层特征结合。大量的实验表明,GroupTransNet优于比较模型,达到了新的最先进的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信