Zhao-Min Chen;Quan Cui;Xiaoqin Zhang;Ruoxi Deng;Chaoqun Xia;Shijian Lu
{"title":"基于梯度均衡和特征多样化的长尾多标签图像识别","authors":"Zhao-Min Chen;Quan Cui;Xiaoqin Zhang;Ruoxi Deng;Chaoqun Xia;Shijian Lu","doi":"10.1109/TMM.2025.3535395","DOIUrl":null,"url":null,"abstract":"Multi-label image recognition with convolutional neural networks has achieved remarkable progress in the past few years. However, most existing multi-label image recognition methods suffer from the long-tailed data distribution problem, <italic>i.e.</i>, head categories occupy most training samples, while tailed classes have few samples. This work firstly studies the influence of long-tailed data distribution on existing multi-label image recognition methods. Based on this, two crucial issues of the existing methods are identified: 1) severe gradient imbalance between head and tailed categories, even though re-balancing strategies are adopted; 2) the lack of diversity of tail category training samples. To tackle the first issue, this paper proposes a group sampling strategy to create group-wise balanced data distribution. Meanwhile, a dynamic gradient balancing loss is proposed to equalize the gradient for all categories. To tackle the second issue, this paper proposes a diversity enhancement module to fuse the information across all categories, preventing the network from overfitting tail classes. Furthermore, it also balances the gradient, promoting the discriminability of learned classifiers. Our method significantly outperforms the baseline method and achieves competitive performance with state-of-the-art methods on VOC-LT and COCO-LT datasets. Extensive ablation studies are conducted to verify the effectiveness of the essential proposals.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"3489-3500"},"PeriodicalIF":9.7000,"publicationDate":"2025-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Towards Gradient Equalization and Feature Diversification for Long-Tailed Multi-Label Image Recognition\",\"authors\":\"Zhao-Min Chen;Quan Cui;Xiaoqin Zhang;Ruoxi Deng;Chaoqun Xia;Shijian Lu\",\"doi\":\"10.1109/TMM.2025.3535395\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Multi-label image recognition with convolutional neural networks has achieved remarkable progress in the past few years. However, most existing multi-label image recognition methods suffer from the long-tailed data distribution problem, <italic>i.e.</i>, head categories occupy most training samples, while tailed classes have few samples. This work firstly studies the influence of long-tailed data distribution on existing multi-label image recognition methods. Based on this, two crucial issues of the existing methods are identified: 1) severe gradient imbalance between head and tailed categories, even though re-balancing strategies are adopted; 2) the lack of diversity of tail category training samples. To tackle the first issue, this paper proposes a group sampling strategy to create group-wise balanced data distribution. Meanwhile, a dynamic gradient balancing loss is proposed to equalize the gradient for all categories. To tackle the second issue, this paper proposes a diversity enhancement module to fuse the information across all categories, preventing the network from overfitting tail classes. Furthermore, it also balances the gradient, promoting the discriminability of learned classifiers. Our method significantly outperforms the baseline method and achieves competitive performance with state-of-the-art methods on VOC-LT and COCO-LT datasets. Extensive ablation studies are conducted to verify the effectiveness of the essential proposals.\",\"PeriodicalId\":13273,\"journal\":{\"name\":\"IEEE Transactions on Multimedia\",\"volume\":\"27 \",\"pages\":\"3489-3500\"},\"PeriodicalIF\":9.7000,\"publicationDate\":\"2025-01-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Multimedia\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10856413/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Multimedia","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10856413/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Towards Gradient Equalization and Feature Diversification for Long-Tailed Multi-Label Image Recognition
Multi-label image recognition with convolutional neural networks has achieved remarkable progress in the past few years. However, most existing multi-label image recognition methods suffer from the long-tailed data distribution problem, i.e., head categories occupy most training samples, while tailed classes have few samples. This work firstly studies the influence of long-tailed data distribution on existing multi-label image recognition methods. Based on this, two crucial issues of the existing methods are identified: 1) severe gradient imbalance between head and tailed categories, even though re-balancing strategies are adopted; 2) the lack of diversity of tail category training samples. To tackle the first issue, this paper proposes a group sampling strategy to create group-wise balanced data distribution. Meanwhile, a dynamic gradient balancing loss is proposed to equalize the gradient for all categories. To tackle the second issue, this paper proposes a diversity enhancement module to fuse the information across all categories, preventing the network from overfitting tail classes. Furthermore, it also balances the gradient, promoting the discriminability of learned classifiers. Our method significantly outperforms the baseline method and achieves competitive performance with state-of-the-art methods on VOC-LT and COCO-LT datasets. Extensive ablation studies are conducted to verify the effectiveness of the essential proposals.
期刊介绍:
The IEEE Transactions on Multimedia delves into diverse aspects of multimedia technology and applications, covering circuits, networking, signal processing, systems, software, and systems integration. The scope aligns with the Fields of Interest of the sponsors, ensuring a comprehensive exploration of research in multimedia.