Semantic-Aware Auto-Encoders for Self-supervised Representation Learning

Guangrun Wang, Yansong Tang, Liang Lin, Philip H. S. Torr
{"title":"Semantic-Aware Auto-Encoders for Self-supervised Representation Learning","authors":"Guangrun Wang, Yansong Tang, Liang Lin, Philip H. S. Torr","doi":"10.1109/CVPR52688.2022.00944","DOIUrl":null,"url":null,"abstract":"The resurgence of unsupervised learning can be attributed to the remarkable progress of self-supervised learning, which includes generative $(\\mathcal{G})$ and discriminative $(\\mathcal{D})$ models. In computer vision, the mainstream self-supervised learning algorithms are $\\mathcal{D}$ models. However, designing a $\\mathcal{D}$ model could be over-complicated; also, some studies hinted that a $\\mathcal{D}$ model might not be as general and interpretable as a $\\mathcal{G}$ model. In this paper, we switch from $\\mathcal{D}$ models to $\\mathcal{G}$ models using the classical auto-encoder $(AE)$. Note that a vanilla $\\mathcal{G}$ model was far less efficient than a $\\mathcal{D}$ model in self-supervised computer vision tasks, as it wastes model capability on overfitting semantic-agnostic high-frequency details. Inspired by perceptual learning that could use cross-view learning to perceive concepts and semantics11Following [26], we refer to semantics as visual concepts, e.g., a semantic-ware model indicates the model can perceive visual concepts, and the learned features are efficient in object recognition, detection, etc., we propose a novel $AE$ that could learn semantic-aware representation via cross-view image reconstruction. We use one view of an image as the input and another view of the same image as the reconstruction target. This kind of $AE$ has rarely been studied before, and the optimization is very difficult. To enhance learning ability and find a feasible solution, we propose a semantic aligner that uses geometric transformation knowledge to align the hidden code of $AE$ to help optimization. These techniques significantly improve the representation learning ability of $AE$ and make selfsupervised learning with $\\mathcal{G}$ models possible. Extensive experiments on many large-scale benchmarks (e.g., ImageNet, COCO 2017, and SYSU-30k) demonstrate the effectiveness of our methods. Code is available at https://github.com/wanggrun/Semantic-Aware-AE.","PeriodicalId":355552,"journal":{"name":"2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CVPR52688.2022.00944","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

Abstract

The resurgence of unsupervised learning can be attributed to the remarkable progress of self-supervised learning, which includes generative $(\mathcal{G})$ and discriminative $(\mathcal{D})$ models. In computer vision, the mainstream self-supervised learning algorithms are $\mathcal{D}$ models. However, designing a $\mathcal{D}$ model could be over-complicated; also, some studies hinted that a $\mathcal{D}$ model might not be as general and interpretable as a $\mathcal{G}$ model. In this paper, we switch from $\mathcal{D}$ models to $\mathcal{G}$ models using the classical auto-encoder $(AE)$. Note that a vanilla $\mathcal{G}$ model was far less efficient than a $\mathcal{D}$ model in self-supervised computer vision tasks, as it wastes model capability on overfitting semantic-agnostic high-frequency details. Inspired by perceptual learning that could use cross-view learning to perceive concepts and semantics11Following [26], we refer to semantics as visual concepts, e.g., a semantic-ware model indicates the model can perceive visual concepts, and the learned features are efficient in object recognition, detection, etc., we propose a novel $AE$ that could learn semantic-aware representation via cross-view image reconstruction. We use one view of an image as the input and another view of the same image as the reconstruction target. This kind of $AE$ has rarely been studied before, and the optimization is very difficult. To enhance learning ability and find a feasible solution, we propose a semantic aligner that uses geometric transformation knowledge to align the hidden code of $AE$ to help optimization. These techniques significantly improve the representation learning ability of $AE$ and make selfsupervised learning with $\mathcal{G}$ models possible. Extensive experiments on many large-scale benchmarks (e.g., ImageNet, COCO 2017, and SYSU-30k) demonstrate the effectiveness of our methods. Code is available at https://github.com/wanggrun/Semantic-Aware-AE.
用于自监督表示学习的语义感知自编码器
无监督学习的复苏可以归因于自监督学习的显著进步,其中包括生成式$(\mathcal{G})$和判别式$(\mathcal{D})$模型。在计算机视觉中,主流的自监督学习算法是$\mathcal{D}$模型。然而,设计一个$\mathcal{D}$模型可能过于复杂;此外,一些研究暗示$\mathcal{D}$模型可能不如$\mathcal{G}$模型一般和可解释。在本文中,我们使用经典的自编码器$(AE)$从$\mathcal{D}$模型切换到$\mathcal{G}$模型。请注意,在自监督计算机视觉任务中,普通的$\mathcal{G}$模型的效率远远低于$\mathcal{D}$模型,因为它将模型能力浪费在了过度拟合语义不可知的高频细节上。受可以使用跨视图学习来感知概念和语义的感知学习的启发[26],我们将语义称为视觉概念,例如,语义感知模型表明模型可以感知视觉概念,并且学习的特征在对象识别,检测等方面是有效的,我们提出了一种新的$AE$,可以通过跨视图图像重建来学习语义感知表示。我们使用图像的一个视图作为输入,并使用同一图像的另一个视图作为重建目标。这类AE的研究很少,优化难度很大。为了提高学习能力并找到可行的解决方案,我们提出了一种语义对齐器,利用几何变换知识对齐$AE$的隐藏代码以帮助优化。这些技术显著提高了$AE$的表示学习能力,使$\mathcal{G}$模型的自监督学习成为可能。在许多大规模基准测试(例如ImageNet、COCO 2017和SYSU-30k)上进行的大量实验证明了我们方法的有效性。代码可从https://github.com/wanggrun/Semantic-Aware-AE获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信