Towards Imbalanced Large Scale Multi-label Classification with Partially Annotated Labels

Xin Zhang, Yuqi Song, Fei Zuo, Xiaofeng Wang
{"title":"Towards Imbalanced Large Scale Multi-label Classification with Partially Annotated Labels","authors":"Xin Zhang, Yuqi Song, Fei Zuo, Xiaofeng Wang","doi":"10.1109/SERA57763.2023.10197667","DOIUrl":null,"url":null,"abstract":"Multi-label classification is a widely encountered problem in daily life, where an instance can be associated with multiple classes. In theory, this is a supervised learning method that requires a large amount of labeling. However, annotating data is time-consuming and may be infeasible for huge labeling spaces. In addition, label imbalance can limit the performance of multi-label classifiers, especially when some labels are missing. Therefore, it is meaningful to study how to train neural networks using partial labels. In this work, we address the issue of label imbalance and investigate how to train classifiers using partial labels in large labeling spaces. First, we introduce the pseudo-labeling technique, which allows commonly adopted networks to be applied in partially labeled settings without the need for additional complex structures. Then, we propose a novel loss function that leverages statistical information from existing datasets to effectively alleviate the label imbalance problem. In addition, we design a dynamic training scheme to reduce the dimension of the labeling space and further mitigate the imbalance. Finally, we conduct extensive experiments on some publicly available multi-label datasets such as COCO, NUS-WIDE, CUB, and Open Images to demonstrate the effectiveness of the proposed approach. The results show that our approach outperforms several state-of-the-art methods, and surprisingly, in some partial labeling settings, our approach even exceeds the methods trained with full labels.","PeriodicalId":211080,"journal":{"name":"2023 IEEE/ACIS 21st International Conference on Software Engineering Research, Management and Applications (SERA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE/ACIS 21st International Conference on Software Engineering Research, Management and Applications (SERA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SERA57763.2023.10197667","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Multi-label classification is a widely encountered problem in daily life, where an instance can be associated with multiple classes. In theory, this is a supervised learning method that requires a large amount of labeling. However, annotating data is time-consuming and may be infeasible for huge labeling spaces. In addition, label imbalance can limit the performance of multi-label classifiers, especially when some labels are missing. Therefore, it is meaningful to study how to train neural networks using partial labels. In this work, we address the issue of label imbalance and investigate how to train classifiers using partial labels in large labeling spaces. First, we introduce the pseudo-labeling technique, which allows commonly adopted networks to be applied in partially labeled settings without the need for additional complex structures. Then, we propose a novel loss function that leverages statistical information from existing datasets to effectively alleviate the label imbalance problem. In addition, we design a dynamic training scheme to reduce the dimension of the labeling space and further mitigate the imbalance. Finally, we conduct extensive experiments on some publicly available multi-label datasets such as COCO, NUS-WIDE, CUB, and Open Images to demonstrate the effectiveness of the proposed approach. The results show that our approach outperforms several state-of-the-art methods, and surprisingly, in some partial labeling settings, our approach even exceeds the methods trained with full labels.
基于部分标注标签的非平衡大规模多标签分类研究
多标签分类是日常生活中经常遇到的问题,一个实例可能与多个类相关联。理论上,这是一种需要大量标注的监督式学习方法。然而,标注数据非常耗时,并且对于巨大的标注空间可能不可行。此外,标签不平衡会限制多标签分类器的性能,特别是当一些标签缺失时。因此,研究如何利用部分标签训练神经网络是很有意义的。在这项工作中,我们解决了标签不平衡的问题,并研究了如何在大标签空间中使用部分标签来训练分类器。首先,我们引入了伪标记技术,该技术允许通常采用的网络应用于部分标记的设置,而不需要额外的复杂结构。然后,我们提出了一种新的损失函数,利用现有数据集的统计信息来有效地缓解标签不平衡问题。此外,我们设计了一种动态训练方案来降低标注空间的维数,进一步缓解标注空间的不平衡。最后,我们在一些公开可用的多标签数据集(如COCO、NUS-WIDE、CUB和Open Images)上进行了大量实验,以证明所提出方法的有效性。结果表明,我们的方法优于几种最先进的方法,令人惊讶的是,在一些部分标记设置中,我们的方法甚至超过了使用完整标签训练的方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信