Confidence-Based PU Learning With Instance-Dependent Label Noise.

IF 10.2 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Xijia Tang, Chao Xu, Hong Tao, Xiaoyu Ma, Chenping Hou
{"title":"Confidence-Based PU Learning With Instance-Dependent Label Noise.","authors":"Xijia Tang, Chao Xu, Hong Tao, Xiaoyu Ma, Chenping Hou","doi":"10.1109/TNNLS.2025.3549510","DOIUrl":null,"url":null,"abstract":"<p><p>Positive and unlabeled (PU) learning, which trains binary classifiers using only PU data, has gained vast attentions in recent years. Traditional PU learning often assumes that all the positive samples are labeled accurately. Nevertheless, due to the reasons such as sample ambiguity and insufficient algorithms, label noise is almost unavoidable in this scenario. Current PU algorithms neglect the label noise issue in the positive set, which is often biased toward certain instances rather than being uniformly distributed in practical applications. We define this important but understudied problem as PU learning with instance-dependent label noise (PUIDN). To eliminate the adverse impact of IDN, we leverage confidence scores for each instance in the positive set, which establish the connection between samples and labels without any assumption on noise distribution. Then, we propose an unbiased estimator for classification risk considering both label and confidence information, which can be computed immediately from PUIDN data along with their confidence scores. Moreover, our classification framework integrates an optimization strategy of alternating iteration based on the correlation between different confidence information, thereby alleviating the additional requirement for training data. Theoretically, we derive a generalization error bound for our proposed method. Experimentally, the effectiveness of our approach is demonstrated through various types of numerical results.</p>","PeriodicalId":13303,"journal":{"name":"IEEE transactions on neural networks and learning systems","volume":"PP ","pages":""},"PeriodicalIF":10.2000,"publicationDate":"2025-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on neural networks and learning systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1109/TNNLS.2025.3549510","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Positive and unlabeled (PU) learning, which trains binary classifiers using only PU data, has gained vast attentions in recent years. Traditional PU learning often assumes that all the positive samples are labeled accurately. Nevertheless, due to the reasons such as sample ambiguity and insufficient algorithms, label noise is almost unavoidable in this scenario. Current PU algorithms neglect the label noise issue in the positive set, which is often biased toward certain instances rather than being uniformly distributed in practical applications. We define this important but understudied problem as PU learning with instance-dependent label noise (PUIDN). To eliminate the adverse impact of IDN, we leverage confidence scores for each instance in the positive set, which establish the connection between samples and labels without any assumption on noise distribution. Then, we propose an unbiased estimator for classification risk considering both label and confidence information, which can be computed immediately from PUIDN data along with their confidence scores. Moreover, our classification framework integrates an optimization strategy of alternating iteration based on the correlation between different confidence information, thereby alleviating the additional requirement for training data. Theoretically, we derive a generalization error bound for our proposed method. Experimentally, the effectiveness of our approach is demonstrated through various types of numerical results.

基于置信度的PU学习与实例相关的标签噪声。
正未标记学习(Positive and unlabelled learning, PU)是一种仅使用PU数据训练二分类器的学习方法,近年来得到了广泛的关注。传统的PU学习通常假设所有阳性样本都被准确标记。然而,由于样本模糊和算法不足等原因,在这种情况下,标签噪声几乎是不可避免的。目前的PU算法忽略了正集中的标签噪声问题,在实际应用中往往偏向于某些实例,而不是均匀分布。我们将这个重要但尚未得到充分研究的问题定义为具有实例相关标签噪声(PUIDN)的PU学习。为了消除IDN的不利影响,我们利用了正集中每个实例的置信度得分,这在不假设噪声分布的情况下建立了样本和标签之间的联系。然后,我们提出了一个考虑标签和置信度信息的分类风险无偏估计,该估计可以立即从PUIDN数据及其置信度得分中计算出来。此外,我们的分类框架集成了基于不同置信度信息之间相关性的交替迭代优化策略,从而减轻了对训练数据的额外要求。理论上,我们推导出了该方法的泛化误差界。实验中,通过各种类型的数值结果证明了我们方法的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
IEEE transactions on neural networks and learning systems
IEEE transactions on neural networks and learning systems COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-COMPUTER SCIENCE, HARDWARE & ARCHITECTURE
CiteScore
23.80
自引率
9.60%
发文量
2102
审稿时长
3-8 weeks
期刊介绍: The focus of IEEE Transactions on Neural Networks and Learning Systems is to present scholarly articles discussing the theory, design, and applications of neural networks as well as other learning systems. The journal primarily highlights technical and scientific research in this domain.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信