Towards Gradient Equalization and Feature Diversification for Long-Tailed Multi-Label Image Recognition

IF 9.7 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS
Zhao-Min Chen;Quan Cui;Xiaoqin Zhang;Ruoxi Deng;Chaoqun Xia;Shijian Lu
{"title":"Towards Gradient Equalization and Feature Diversification for Long-Tailed Multi-Label Image Recognition","authors":"Zhao-Min Chen;Quan Cui;Xiaoqin Zhang;Ruoxi Deng;Chaoqun Xia;Shijian Lu","doi":"10.1109/TMM.2025.3535395","DOIUrl":null,"url":null,"abstract":"Multi-label image recognition with convolutional neural networks has achieved remarkable progress in the past few years. However, most existing multi-label image recognition methods suffer from the long-tailed data distribution problem, <italic>i.e.</i>, head categories occupy most training samples, while tailed classes have few samples. This work firstly studies the influence of long-tailed data distribution on existing multi-label image recognition methods. Based on this, two crucial issues of the existing methods are identified: 1) severe gradient imbalance between head and tailed categories, even though re-balancing strategies are adopted; 2) the lack of diversity of tail category training samples. To tackle the first issue, this paper proposes a group sampling strategy to create group-wise balanced data distribution. Meanwhile, a dynamic gradient balancing loss is proposed to equalize the gradient for all categories. To tackle the second issue, this paper proposes a diversity enhancement module to fuse the information across all categories, preventing the network from overfitting tail classes. Furthermore, it also balances the gradient, promoting the discriminability of learned classifiers. Our method significantly outperforms the baseline method and achieves competitive performance with state-of-the-art methods on VOC-LT and COCO-LT datasets. Extensive ablation studies are conducted to verify the effectiveness of the essential proposals.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"3489-3500"},"PeriodicalIF":9.7000,"publicationDate":"2025-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Multimedia","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10856413/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Multi-label image recognition with convolutional neural networks has achieved remarkable progress in the past few years. However, most existing multi-label image recognition methods suffer from the long-tailed data distribution problem, i.e., head categories occupy most training samples, while tailed classes have few samples. This work firstly studies the influence of long-tailed data distribution on existing multi-label image recognition methods. Based on this, two crucial issues of the existing methods are identified: 1) severe gradient imbalance between head and tailed categories, even though re-balancing strategies are adopted; 2) the lack of diversity of tail category training samples. To tackle the first issue, this paper proposes a group sampling strategy to create group-wise balanced data distribution. Meanwhile, a dynamic gradient balancing loss is proposed to equalize the gradient for all categories. To tackle the second issue, this paper proposes a diversity enhancement module to fuse the information across all categories, preventing the network from overfitting tail classes. Furthermore, it also balances the gradient, promoting the discriminability of learned classifiers. Our method significantly outperforms the baseline method and achieves competitive performance with state-of-the-art methods on VOC-LT and COCO-LT datasets. Extensive ablation studies are conducted to verify the effectiveness of the essential proposals.
基于梯度均衡和特征多样化的长尾多标签图像识别
卷积神经网络的多标签图像识别在过去几年中取得了显著的进展。然而,现有的大多数多标签图像识别方法都存在数据长尾分布的问题,即头部类别占据了大部分训练样本,而尾部类别的样本很少。本文首先研究了长尾数据分布对现有多标签图像识别方法的影响。在此基础上,指出了现有方法存在的两个关键问题:1)尽管采用了再平衡策略,但头部和尾部类别之间的梯度严重不平衡;2)尾类训练样本缺乏多样性。为了解决第一个问题,本文提出了一种组抽样策略来创建组明智的平衡数据分布。同时,提出了一种动态梯度平衡损失来均衡所有类别的梯度。为了解决第二个问题,本文提出了一个多样性增强模块来融合所有类别的信息,防止网络过度拟合尾类。此外,它还平衡了梯度,提高了学习到的分类器的可判别性。我们的方法明显优于基线方法,并在VOC-LT和COCO-LT数据集上实现了与最先进方法的竞争性能。进行了广泛的消融研究,以验证基本建议的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
IEEE Transactions on Multimedia
IEEE Transactions on Multimedia 工程技术-电信学
CiteScore
11.70
自引率
11.00%
发文量
576
审稿时长
5.5 months
期刊介绍: The IEEE Transactions on Multimedia delves into diverse aspects of multimedia technology and applications, covering circuits, networking, signal processing, systems, software, and systems integration. The scope aligns with the Fields of Interest of the sponsors, ensuring a comprehensive exploration of research in multimedia.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信