Unsupervised Sounding Object Localization with Bottom-Up and Top-Down Attention

2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Pub Date : 2022-01-01 DOI:10.1109/WACV51458.2022.00222

Jiaying Shi, Chao Ma

引用次数: 8

Abstract

Learning to localize sounding objects in visual scenes without manual annotations has drawn increasing attention recently. In this paper, we propose an unsupervised sounding object localization algorithm by using bottom-up and top-down attention in visual scenes. The bottom-up attention module generates an objectness confidence map, while the top-down attention draws the similarity between sound and visual regions. Moreover, we propose a bottom-up attention loss function, which models the correlation relationship between bottom-up and top-down attention. Extensive experimental results demonstrate that our proposed unsupervised method significantly advances the state-of-the-art unsupervised methods. The source code is available at https://github.com/VISION-SJTU/USOL.

查看原文本刊更多论文

基于自底向上和自顶向下注意的无监督探测目标定位

如何在不需要人工标注的情况下对视觉场景中的发声物体进行定位，近年来受到越来越多的关注。本文提出了一种基于自底向上和自顶向下的视觉场景无监督探测目标定位算法。自下而上的注意模块生成对象置信度图，而自上而下的注意模块绘制声音和视觉区域之间的相似性。此外，我们提出了一个自下而上的注意损失函数，该函数模拟了自下而上和自上而下的注意之间的相关关系。大量的实验结果表明，我们提出的无监督方法显着提高了最先进的无监督方法。源代码可从https://github.com/VISION-SJTU/USOL获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)

自引率

0.00%

发文量