{"title":"基于双粒度平衡学习的长尾图像分类","authors":"Ning Ren , Xiaosong Li , Yanxia Wu , Yan Fu","doi":"10.1016/j.cviu.2025.104469","DOIUrl":null,"url":null,"abstract":"<div><div>In long-tailed datasets, the training of deep neural network-based models faces challenges, where the model may become biased towards the head classes with abundant training data, resulting in poor performance on tail classes with limited samples. Most current methods employ contrastive learning to learn more balanced representations by finding the class center. However, these methods use class centers to address local imbalance within a mini-batch, they overlook the global imbalance between batches throughout an epoch, caused by the long-tailed distribution of the dataset. In this paper, we propose <strong>bi-granularity balance</strong> learning to address the two-layer imbalance. We decouple the attraction–repulsion term in contrastive loss into two independent components: global and local balance. The global balance component focuses on capturing semantic information from different perspectives of the image and shifting learning attention from the head classes to the tail classes in the global perspective. The local balance component aims to learn inter-class separability from the local perspective. The proposed method efficiently learns the intra-class compactness and inter-class separability in long-tailed model training and improves the performance of the long-tailed model. Experimental results show that the proposed method achieves competitive performance on long-tailed benchmarks such as CIFAR-10/100-LT, TinyImageNet-LT, and iNaturalist 2018.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"261 ","pages":"Article 104469"},"PeriodicalIF":3.5000,"publicationDate":"2025-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Bi-granularity balance learning for long-tailed image classification\",\"authors\":\"Ning Ren , Xiaosong Li , Yanxia Wu , Yan Fu\",\"doi\":\"10.1016/j.cviu.2025.104469\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>In long-tailed datasets, the training of deep neural network-based models faces challenges, where the model may become biased towards the head classes with abundant training data, resulting in poor performance on tail classes with limited samples. Most current methods employ contrastive learning to learn more balanced representations by finding the class center. However, these methods use class centers to address local imbalance within a mini-batch, they overlook the global imbalance between batches throughout an epoch, caused by the long-tailed distribution of the dataset. In this paper, we propose <strong>bi-granularity balance</strong> learning to address the two-layer imbalance. We decouple the attraction–repulsion term in contrastive loss into two independent components: global and local balance. The global balance component focuses on capturing semantic information from different perspectives of the image and shifting learning attention from the head classes to the tail classes in the global perspective. The local balance component aims to learn inter-class separability from the local perspective. The proposed method efficiently learns the intra-class compactness and inter-class separability in long-tailed model training and improves the performance of the long-tailed model. Experimental results show that the proposed method achieves competitive performance on long-tailed benchmarks such as CIFAR-10/100-LT, TinyImageNet-LT, and iNaturalist 2018.</div></div>\",\"PeriodicalId\":50633,\"journal\":{\"name\":\"Computer Vision and Image Understanding\",\"volume\":\"261 \",\"pages\":\"Article 104469\"},\"PeriodicalIF\":3.5000,\"publicationDate\":\"2025-08-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer Vision and Image Understanding\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1077314225001924\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Vision and Image Understanding","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1077314225001924","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Bi-granularity balance learning for long-tailed image classification
In long-tailed datasets, the training of deep neural network-based models faces challenges, where the model may become biased towards the head classes with abundant training data, resulting in poor performance on tail classes with limited samples. Most current methods employ contrastive learning to learn more balanced representations by finding the class center. However, these methods use class centers to address local imbalance within a mini-batch, they overlook the global imbalance between batches throughout an epoch, caused by the long-tailed distribution of the dataset. In this paper, we propose bi-granularity balance learning to address the two-layer imbalance. We decouple the attraction–repulsion term in contrastive loss into two independent components: global and local balance. The global balance component focuses on capturing semantic information from different perspectives of the image and shifting learning attention from the head classes to the tail classes in the global perspective. The local balance component aims to learn inter-class separability from the local perspective. The proposed method efficiently learns the intra-class compactness and inter-class separability in long-tailed model training and improves the performance of the long-tailed model. Experimental results show that the proposed method achieves competitive performance on long-tailed benchmarks such as CIFAR-10/100-LT, TinyImageNet-LT, and iNaturalist 2018.
期刊介绍:
The central focus of this journal is the computer analysis of pictorial information. Computer Vision and Image Understanding publishes papers covering all aspects of image analysis from the low-level, iconic processes of early vision to the high-level, symbolic processes of recognition and interpretation. A wide range of topics in the image understanding area is covered, including papers offering insights that differ from predominant views.
Research Areas Include:
• Theory
• Early vision
• Data structures and representations
• Shape
• Range
• Motion
• Matching and recognition
• Architecture and languages
• Vision systems