Decoupled Contrastive Learning for Long-Tailed Recognition

ArXiv Pub Date : 2024-03-10 DOI:10.1609/aaai.v38i6.28459

Shiyu Xuan, Shiliang Zhang

{"title":"Decoupled Contrastive Learning for Long-Tailed Recognition","authors":"Shiyu Xuan, Shiliang Zhang","doi":"10.1609/aaai.v38i6.28459","DOIUrl":null,"url":null,"abstract":"Supervised Contrastive Loss (SCL) is popular in visual representation learning.\n Given an anchor image, SCL pulls two types of positive samples, i.e., its augmentation and other images from the same class together, while pushes negative images apart to optimize the learned embedding. In the scenario of long-tailed recognition, where the number of samples in each class is imbalanced, treating two types of positive samples equally leads to the biased optimization for intra-category distance. In addition, similarity relationship among negative samples, that are ignored by SCL, also presents meaningful semantic cues. To improve the performance on long-tailed recognition, this paper addresses those two issues of SCL by decoupling the training objective. Specifically, it decouples two types of positives in SCL and optimizes their relations toward different objectives to alleviate the influence of the imbalanced dataset. We further propose a patch-based self distillation to transfer knowledge from head to tail classes to relieve the under-representation of tail classes. It uses patch-based features to mine shared visual patterns among different instances and leverages a self distillation procedure to transfer such knowledge. Experiments on different long-tailed classification benchmarks demonstrate the superiority of our method. For instance, it achieves the 57.7% top-1 accuracy on the ImageNet-LT dataset. Combined with the ensemble-based method, the performance can be further boosted to 59.7%, which substantially outperforms many recent works. Our code will be released.","PeriodicalId":513202,"journal":{"name":"ArXiv","volume":"20 16","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ArXiv","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1609/aaai.v38i6.28459","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Supervised Contrastive Loss (SCL) is popular in visual representation learning. Given an anchor image, SCL pulls two types of positive samples, i.e., its augmentation and other images from the same class together, while pushes negative images apart to optimize the learned embedding. In the scenario of long-tailed recognition, where the number of samples in each class is imbalanced, treating two types of positive samples equally leads to the biased optimization for intra-category distance. In addition, similarity relationship among negative samples, that are ignored by SCL, also presents meaningful semantic cues. To improve the performance on long-tailed recognition, this paper addresses those two issues of SCL by decoupling the training objective. Specifically, it decouples two types of positives in SCL and optimizes their relations toward different objectives to alleviate the influence of the imbalanced dataset. We further propose a patch-based self distillation to transfer knowledge from head to tail classes to relieve the under-representation of tail classes. It uses patch-based features to mine shared visual patterns among different instances and leverages a self distillation procedure to transfer such knowledge. Experiments on different long-tailed classification benchmarks demonstrate the superiority of our method. For instance, it achieves the 57.7% top-1 accuracy on the ImageNet-LT dataset. Combined with the ensemble-based method, the performance can be further boosted to 59.7%, which substantially outperforms many recent works. Our code will be released.

查看原文本刊更多论文

针对长尾识别的解耦对比学习

有监督对比损失（SCL）在视觉表征学习中非常流行。在给定锚图像的情况下，SCL 会将两类正样本（即其增强样本和同类的其他图像）拉到一起，同时将负样本推开，以优化学习到的嵌入。在长尾识别的情况下，每个类别中的样本数量是不平衡的，对两类正样本一视同仁会导致类别内距离的优化出现偏差。此外，被 SCL 忽视的负样本之间的相似性关系也提供了有意义的语义线索。为了提高长尾识别的性能，本文通过解耦训练目标来解决 SCL 的这两个问题。具体来说，它将 SCL 中的两类阳性解耦，并优化它们与不同目标的关系，以减轻不平衡数据集的影响。我们进一步提出了一种基于补丁的自我提炼方法，将头部类别的知识转移到尾部类别，以缓解尾部类别代表性不足的问题。它利用基于补丁的特征来挖掘不同实例之间的共享视觉模式，并利用自我蒸馏程序来转移这些知识。对不同长尾分类基准的实验证明了我们方法的优越性。例如，它在 ImageNet-LT 数据集上达到了 57.7% 的最高准确率。结合基于集合的方法，其性能可进一步提高到 59.7%，大大超过了许多最新的研究成果。我们的代码即将发布。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ArXiv

自引率

0.00%

发文量