Xijia Tang, Chao Xu, Hong Tao, Xiaoyu Ma, Chenping Hou
{"title":"Confidence-Based PU Learning With Instance-Dependent Label Noise.","authors":"Xijia Tang, Chao Xu, Hong Tao, Xiaoyu Ma, Chenping Hou","doi":"10.1109/TNNLS.2025.3549510","DOIUrl":null,"url":null,"abstract":"<p><p>Positive and unlabeled (PU) learning, which trains binary classifiers using only PU data, has gained vast attentions in recent years. Traditional PU learning often assumes that all the positive samples are labeled accurately. Nevertheless, due to the reasons such as sample ambiguity and insufficient algorithms, label noise is almost unavoidable in this scenario. Current PU algorithms neglect the label noise issue in the positive set, which is often biased toward certain instances rather than being uniformly distributed in practical applications. We define this important but understudied problem as PU learning with instance-dependent label noise (PUIDN). To eliminate the adverse impact of IDN, we leverage confidence scores for each instance in the positive set, which establish the connection between samples and labels without any assumption on noise distribution. Then, we propose an unbiased estimator for classification risk considering both label and confidence information, which can be computed immediately from PUIDN data along with their confidence scores. Moreover, our classification framework integrates an optimization strategy of alternating iteration based on the correlation between different confidence information, thereby alleviating the additional requirement for training data. Theoretically, we derive a generalization error bound for our proposed method. Experimentally, the effectiveness of our approach is demonstrated through various types of numerical results.</p>","PeriodicalId":13303,"journal":{"name":"IEEE transactions on neural networks and learning systems","volume":"PP ","pages":""},"PeriodicalIF":10.2000,"publicationDate":"2025-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on neural networks and learning systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1109/TNNLS.2025.3549510","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Positive and unlabeled (PU) learning, which trains binary classifiers using only PU data, has gained vast attentions in recent years. Traditional PU learning often assumes that all the positive samples are labeled accurately. Nevertheless, due to the reasons such as sample ambiguity and insufficient algorithms, label noise is almost unavoidable in this scenario. Current PU algorithms neglect the label noise issue in the positive set, which is often biased toward certain instances rather than being uniformly distributed in practical applications. We define this important but understudied problem as PU learning with instance-dependent label noise (PUIDN). To eliminate the adverse impact of IDN, we leverage confidence scores for each instance in the positive set, which establish the connection between samples and labels without any assumption on noise distribution. Then, we propose an unbiased estimator for classification risk considering both label and confidence information, which can be computed immediately from PUIDN data along with their confidence scores. Moreover, our classification framework integrates an optimization strategy of alternating iteration based on the correlation between different confidence information, thereby alleviating the additional requirement for training data. Theoretically, we derive a generalization error bound for our proposed method. Experimentally, the effectiveness of our approach is demonstrated through various types of numerical results.
正未标记学习(Positive and unlabelled learning, PU)是一种仅使用PU数据训练二分类器的学习方法,近年来得到了广泛的关注。传统的PU学习通常假设所有阳性样本都被准确标记。然而,由于样本模糊和算法不足等原因,在这种情况下,标签噪声几乎是不可避免的。目前的PU算法忽略了正集中的标签噪声问题,在实际应用中往往偏向于某些实例,而不是均匀分布。我们将这个重要但尚未得到充分研究的问题定义为具有实例相关标签噪声(PUIDN)的PU学习。为了消除IDN的不利影响,我们利用了正集中每个实例的置信度得分,这在不假设噪声分布的情况下建立了样本和标签之间的联系。然后,我们提出了一个考虑标签和置信度信息的分类风险无偏估计,该估计可以立即从PUIDN数据及其置信度得分中计算出来。此外,我们的分类框架集成了基于不同置信度信息之间相关性的交替迭代优化策略,从而减轻了对训练数据的额外要求。理论上,我们推导出了该方法的泛化误差界。实验中,通过各种类型的数值结果证明了我们方法的有效性。
期刊介绍:
The focus of IEEE Transactions on Neural Networks and Learning Systems is to present scholarly articles discussing the theory, design, and applications of neural networks as well as other learning systems. The journal primarily highlights technical and scientific research in this domain.