基于布朗距离协方差匹配的暹罗变压器群智共显著目标检测

ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2023-06-04 DOI:10.1109/ICASSP49357.2023.10096177

Yangjun Wu, H. Zhang, Lingyan Liang, Yaqian Zhao, Kaihua Zhang

{"title":"基于布朗距离协方差匹配的暹罗变压器群智共显著目标检测","authors":"Yangjun Wu, H. Zhang, Lingyan Liang, Yaqian Zhao, Kaihua Zhang","doi":"10.1109/ICASSP49357.2023.10096177","DOIUrl":null,"url":null,"abstract":"Co-salient object detection (CoSOD) aims to discover and segment foreground targets in a group of images with the same semantic category. Existing mainstream approaches often employ convolutional neural networks (CNNs) to learn the semantic-invariant features from a group of images. Despite demonstrated success, there exist two limitations: 1) The CNNs introduce the inductive bias of locality that are difficult to model long-range dependency, limiting their feature representation capability. 2) Their models lack discriminability to differentiate semantic differences between different groups since only one group of images with the same semantic category has been taken into account for model training. To address these issues, this paper presents a Siamese Transformer architecture for CoSOD that can fully mine the group-wise semantic contrast information for more discriminative feature learning. Specifically, the designed Siamese Transformer takes two groups of images as input for feature contrastive learning. Each group is processed by a Transformer branch with shared weights to capture the long-range interaction information. Besides, to model the complex non-linear interactions between these two branches, we further design a Brownian distance covariance (BDC) module that uses joint distribution to measure the inter- and intra-group semantic similarity. The BDC can be efficiently calculated in closed form that can fully characterize independence for effective feature contrastive learning. Extensive evaluations on the three largest and most challenging benchmark datasets (CoSal2015, CoCA, and CoSOD3k) demonstrate the superiority of our method over a variety of state-of-the-art methods.","PeriodicalId":113072,"journal":{"name":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Group-Wise Co-Salient Object Detection with Siamese Transformers Via Brownian Distance Covariance Matching\",\"authors\":\"Yangjun Wu, H. Zhang, Lingyan Liang, Yaqian Zhao, Kaihua Zhang\",\"doi\":\"10.1109/ICASSP49357.2023.10096177\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Co-salient object detection (CoSOD) aims to discover and segment foreground targets in a group of images with the same semantic category. Existing mainstream approaches often employ convolutional neural networks (CNNs) to learn the semantic-invariant features from a group of images. Despite demonstrated success, there exist two limitations: 1) The CNNs introduce the inductive bias of locality that are difficult to model long-range dependency, limiting their feature representation capability. 2) Their models lack discriminability to differentiate semantic differences between different groups since only one group of images with the same semantic category has been taken into account for model training. To address these issues, this paper presents a Siamese Transformer architecture for CoSOD that can fully mine the group-wise semantic contrast information for more discriminative feature learning. Specifically, the designed Siamese Transformer takes two groups of images as input for feature contrastive learning. Each group is processed by a Transformer branch with shared weights to capture the long-range interaction information. Besides, to model the complex non-linear interactions between these two branches, we further design a Brownian distance covariance (BDC) module that uses joint distribution to measure the inter- and intra-group semantic similarity. The BDC can be efficiently calculated in closed form that can fully characterize independence for effective feature contrastive learning. Extensive evaluations on the three largest and most challenging benchmark datasets (CoSal2015, CoCA, and CoSOD3k) demonstrate the superiority of our method over a variety of state-of-the-art methods.\",\"PeriodicalId\":113072,\"journal\":{\"name\":\"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)\",\"volume\":\"55 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-06-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICASSP49357.2023.10096177\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSP49357.2023.10096177","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

共同显著目标检测(CoSOD)的目的是在一组具有相同语义类别的图像中发现和分割前景目标。现有的主流方法通常使用卷积神经网络(cnn)从一组图像中学习语义不变特征。尽管取得了成功，但仍存在两个局限性:1)cnn引入了局部性的归纳偏差，难以对远程依赖进行建模，限制了其特征表示能力。2)他们的模型缺乏区分不同组之间语义差异的可判别性，因为模型训练只考虑了一组具有相同语义类别的图像。为了解决这些问题，本文提出了一种用于CoSOD的Siamese Transformer架构，该架构可以充分挖掘组智能语义对比信息，以进行更具判别性的特征学习。具体来说，设计的Siamese Transformer以两组图像作为特征对比学习的输入。每个组都由一个Transformer分支处理，并使用共享的权重来获取远程交互信息。此外，为了模拟这两个分支之间复杂的非线性相互作用，我们进一步设计了一个布朗距离协方差(BDC)模块，该模块使用联合分布来度量组间和组内语义相似度。BDC可以有效地以封闭形式计算，可以充分表征独立性，从而实现有效的特征对比学习。对三个最大和最具挑战性的基准数据集(CoSal2015, CoCA和CoSOD3k)的广泛评估表明，我们的方法优于各种最先进的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Group-Wise Co-Salient Object Detection with Siamese Transformers Via Brownian Distance Covariance Matching

Co-salient object detection (CoSOD) aims to discover and segment foreground targets in a group of images with the same semantic category. Existing mainstream approaches often employ convolutional neural networks (CNNs) to learn the semantic-invariant features from a group of images. Despite demonstrated success, there exist two limitations: 1) The CNNs introduce the inductive bias of locality that are difficult to model long-range dependency, limiting their feature representation capability. 2) Their models lack discriminability to differentiate semantic differences between different groups since only one group of images with the same semantic category has been taken into account for model training. To address these issues, this paper presents a Siamese Transformer architecture for CoSOD that can fully mine the group-wise semantic contrast information for more discriminative feature learning. Specifically, the designed Siamese Transformer takes two groups of images as input for feature contrastive learning. Each group is processed by a Transformer branch with shared weights to capture the long-range interaction information. Besides, to model the complex non-linear interactions between these two branches, we further design a Brownian distance covariance (BDC) module that uses joint distribution to measure the inter- and intra-group semantic similarity. The BDC can be efficiently calculated in closed form that can fully characterize independence for effective feature contrastive learning. Extensive evaluations on the three largest and most challenging benchmark datasets (CoSal2015, CoCA, and CoSOD3k) demonstrate the superiority of our method over a variety of state-of-the-art methods.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

自引率

0.00%

发文量