Towards Gradient Equalization and Feature Diversification for Long-Tailed Multi-Label Image Recognition

IF 9.7 1区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Multimedia Pub Date : 2025-01-28 DOI:10.1109/TMM.2025.3535395

Zhao-Min Chen;Quan Cui;Xiaoqin Zhang;Ruoxi Deng;Chaoqun Xia;Shijian Lu

{"title":"Towards Gradient Equalization and Feature Diversification for Long-Tailed Multi-Label Image Recognition","authors":"Zhao-Min Chen;Quan Cui;Xiaoqin Zhang;Ruoxi Deng;Chaoqun Xia;Shijian Lu","doi":"10.1109/TMM.2025.3535395","DOIUrl":null,"url":null,"abstract":"Multi-label image recognition with convolutional neural networks has achieved remarkable progress in the past few years. However, most existing multi-label image recognition methods suffer from the long-tailed data distribution problem, <italic>i.e.</i>, head categories occupy most training samples, while tailed classes have few samples. This work firstly studies the influence of long-tailed data distribution on existing multi-label image recognition methods. Based on this, two crucial issues of the existing methods are identified: 1) severe gradient imbalance between head and tailed categories, even though re-balancing strategies are adopted; 2) the lack of diversity of tail category training samples. To tackle the first issue, this paper proposes a group sampling strategy to create group-wise balanced data distribution. Meanwhile, a dynamic gradient balancing loss is proposed to equalize the gradient for all categories. To tackle the second issue, this paper proposes a diversity enhancement module to fuse the information across all categories, preventing the network from overfitting tail classes. Furthermore, it also balances the gradient, promoting the discriminability of learned classifiers. Our method significantly outperforms the baseline method and achieves competitive performance with state-of-the-art methods on VOC-LT and COCO-LT datasets. Extensive ablation studies are conducted to verify the effectiveness of the essential proposals.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"3489-3500"},"PeriodicalIF":9.7000,"publicationDate":"2025-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Multimedia","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10856413/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Multi-label image recognition with convolutional neural networks has achieved remarkable progress in the past few years. However, most existing multi-label image recognition methods suffer from the long-tailed data distribution problem, i.e., head categories occupy most training samples, while tailed classes have few samples. This work firstly studies the influence of long-tailed data distribution on existing multi-label image recognition methods. Based on this, two crucial issues of the existing methods are identified: 1) severe gradient imbalance between head and tailed categories, even though re-balancing strategies are adopted; 2) the lack of diversity of tail category training samples. To tackle the first issue, this paper proposes a group sampling strategy to create group-wise balanced data distribution. Meanwhile, a dynamic gradient balancing loss is proposed to equalize the gradient for all categories. To tackle the second issue, this paper proposes a diversity enhancement module to fuse the information across all categories, preventing the network from overfitting tail classes. Furthermore, it also balances the gradient, promoting the discriminability of learned classifiers. Our method significantly outperforms the baseline method and achieves competitive performance with state-of-the-art methods on VOC-LT and COCO-LT datasets. Extensive ablation studies are conducted to verify the effectiveness of the essential proposals.

查看原文本刊更多论文

基于梯度均衡和特征多样化的长尾多标签图像识别

卷积神经网络的多标签图像识别在过去几年中取得了显著的进展。然而，现有的大多数多标签图像识别方法都存在数据长尾分布的问题，即头部类别占据了大部分训练样本，而尾部类别的样本很少。本文首先研究了长尾数据分布对现有多标签图像识别方法的影响。在此基础上，指出了现有方法存在的两个关键问题：1)尽管采用了再平衡策略，但头部和尾部类别之间的梯度严重不平衡；2)尾类训练样本缺乏多样性。为了解决第一个问题，本文提出了一种组抽样策略来创建组明智的平衡数据分布。同时，提出了一种动态梯度平衡损失来均衡所有类别的梯度。为了解决第二个问题，本文提出了一个多样性增强模块来融合所有类别的信息，防止网络过度拟合尾类。此外，它还平衡了梯度，提高了学习到的分类器的可判别性。我们的方法明显优于基线方法，并在VOC-LT和COCO-LT数据集上实现了与最先进方法的竞争性能。进行了广泛的消融研究，以验证基本建议的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Multimedia 工程技术-电信学

CiteScore

11.70

自引率

11.00%

发文量

576

审稿时长

5.5 months

期刊介绍： The IEEE Transactions on Multimedia delves into diverse aspects of multimedia technology and applications, covering circuits, networking, signal processing, systems, software, and systems integration. The scope aligns with the Fields of Interest of the sponsors, ensuring a comprehensive exploration of research in multimedia.