Visualizing Paired Image Similarity in Transformer Networks

2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Pub Date : 2022-01-01 DOI:10.1109/WACV51458.2022.00160

Samuel Black, Abby Stylianou, Robert Pless, Richard Souvenir

引用次数: 2

Abstract

Transformer architectures have shown promise for a wide range of computer vision tasks, including image embedding. As was the case with convolutional neural networks and other models, explainability of the predictions is a key concern, but visualization approaches tend to be architecture-specific. In this paper, we introduce a new method for producing interpretable visualizations that, given a pair of images encoded with a Transformer, show which regions contributed to their similarity. Additionally, for the task of image retrieval, we compare the performance of Transformer and ResNet models of similar capacity and show that while they have similar performance in aggregate, the retrieved results and the visual explanations for those results are quite different. Code is available at https://github.com/vidarlab/xformer-paired-viz.

查看原文本刊更多论文

变压器网络中成对图像相似度的可视化

变压器架构已经显示出广泛的计算机视觉任务的前景，包括图像嵌入。与卷积神经网络和其他模型的情况一样，预测的可解释性是一个关键问题，但可视化方法往往是特定于体系结构的。在本文中，我们介绍了一种新的方法来产生可解释的可视化，给定一对用Transformer编码的图像，显示哪些区域促成了它们的相似性。此外，对于图像检索任务，我们比较了Transformer和ResNet相似容量模型的性能，结果表明，虽然它们总体上具有相似的性能，但检索结果和对这些结果的视觉解释却有很大不同。代码可从https://github.com/vidarlab/xformer-paired-viz获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)

自引率

0.00%

发文量