物体地标表征的等变与不变学习

2021 IEEE/CVF International Conference on Computer Vision (ICCV) Pub Date : 2021-10-01 DOI:10.1109/ICCV48922.2021.00975

Zezhou Cheng, Jong-Chyi Su, Subhransu Maji

{"title":"物体地标表征的等变与不变学习","authors":"Zezhou Cheng, Jong-Chyi Su, Subhransu Maji","doi":"10.1109/ICCV48922.2021.00975","DOIUrl":null,"url":null,"abstract":"Given a collection of images, humans are able to discover landmarks by modeling the shared geometric structure across instances. This idea of geometric equivariance has been widely used for the unsupervised discovery of object landmark representations. In this paper, we develop a simple and effective approach by combining instance-discriminative and spatially-discriminative contrastive learning. We show that when a deep network is trained to be invariant to geometric and photometric transformations, representations emerge from its intermediate layers that are highly predictive of object landmarks. Stacking these across layers in a \"hypercolumn\" and projecting them using spatially-contrastive learning further improves their performance on matching and few-shot landmark regression tasks. We also present a unified view of existing equivariant and invariant representation learning approaches through the lens of contrastive learning, shedding light on the nature of invariances learned. Experiments on standard benchmarks for landmark learning, as well as a new challenging one we propose, show that the proposed approach surpasses prior state-of-the-art.","PeriodicalId":6820,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"33 1","pages":"9877-9886"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":"{\"title\":\"On Equivariant and Invariant Learning of Object Landmark Representations\",\"authors\":\"Zezhou Cheng, Jong-Chyi Su, Subhransu Maji\",\"doi\":\"10.1109/ICCV48922.2021.00975\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Given a collection of images, humans are able to discover landmarks by modeling the shared geometric structure across instances. This idea of geometric equivariance has been widely used for the unsupervised discovery of object landmark representations. In this paper, we develop a simple and effective approach by combining instance-discriminative and spatially-discriminative contrastive learning. We show that when a deep network is trained to be invariant to geometric and photometric transformations, representations emerge from its intermediate layers that are highly predictive of object landmarks. Stacking these across layers in a \\\"hypercolumn\\\" and projecting them using spatially-contrastive learning further improves their performance on matching and few-shot landmark regression tasks. We also present a unified view of existing equivariant and invariant representation learning approaches through the lens of contrastive learning, shedding light on the nature of invariances learned. Experiments on standard benchmarks for landmark learning, as well as a new challenging one we propose, show that the proposed approach surpasses prior state-of-the-art.\",\"PeriodicalId\":6820,\"journal\":{\"name\":\"2021 IEEE/CVF International Conference on Computer Vision (ICCV)\",\"volume\":\"33 1\",\"pages\":\"9877-9886\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"13\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE/CVF International Conference on Computer Vision (ICCV)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCV48922.2021.00975\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE/CVF International Conference on Computer Vision (ICCV)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCV48922.2021.00975","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 13

摘要

给定一组图像，人类能够通过对实例之间的共享几何结构建模来发现地标。这种几何等方差的思想已被广泛应用于物体地标表示的无监督发现。本文将实例判别和空间判别相结合，提出了一种简单有效的对比学习方法。我们表明，当一个深度网络被训练成对几何和光度变换不变性时，它的中间层会出现对物体地标具有高度预测性的表示。在“超列”中堆叠这些跨层，并使用空间对比学习来投射它们，进一步提高了它们在匹配和少量地标回归任务上的性能。我们还通过对比学习的视角，对现有的等变和不变表征学习方法提出了统一的观点，揭示了所学习的不变性的本质。在里程碑式学习的标准基准以及我们提出的一个新的具有挑战性的基准上进行的实验表明，我们提出的方法超越了现有的最先进的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

On Equivariant and Invariant Learning of Object Landmark Representations

Given a collection of images, humans are able to discover landmarks by modeling the shared geometric structure across instances. This idea of geometric equivariance has been widely used for the unsupervised discovery of object landmark representations. In this paper, we develop a simple and effective approach by combining instance-discriminative and spatially-discriminative contrastive learning. We show that when a deep network is trained to be invariant to geometric and photometric transformations, representations emerge from its intermediate layers that are highly predictive of object landmarks. Stacking these across layers in a "hypercolumn" and projecting them using spatially-contrastive learning further improves their performance on matching and few-shot landmark regression tasks. We also present a unified view of existing equivariant and invariant representation learning approaches through the lens of contrastive learning, shedding light on the nature of invariances learned. Experiments on standard benchmarks for landmark learning, as well as a new challenging one we propose, show that the proposed approach surpasses prior state-of-the-art.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 IEEE/CVF International Conference on Computer Vision (ICCV)

自引率

0.00%

发文量