Diversify and Conquer: Open-Set Disagreement for Robust Semi-Supervised Learning With Outliers

IF 8.9 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE transactions on neural networks and learning systems Pub Date : 2025-03-28 DOI:10.1109/TNNLS.2025.3547801

Heejo Kong;Sung-Jin Kim;Gunho Jung;Seong-Whan Lee

{"title":"Diversify and Conquer: Open-Set Disagreement for Robust Semi-Supervised Learning With Outliers","authors":"Heejo Kong;Sung-Jin Kim;Gunho Jung;Seong-Whan Lee","doi":"10.1109/TNNLS.2025.3547801","DOIUrl":null,"url":null,"abstract":"Conventional semi-supervised learning (SSL) ideally assumes that labeled and unlabeled data share an identical class distribution; however, in practice, this assumption is easily violated, as unlabeled data often includes unknown class data, i.e., outliers. The outliers are treated as noise, considerably degrading the performance of SSL models. To address this drawback, we propose a novel framework, diversify and conquer (DAC), to enhance SSL robustness in the context of open-set SSL (OSSL). In particular, we note that existing OSSL methods rely on prediction discrepancies between inliers and outliers from a single model trained on labeled data. This approach can be easily failed when the labeled data are insufficient, leading to performance degradation that is worse than naive SSL that do not account for outliers. In contrast, our approach exploits prediction disagreements among multiple models that are differently biased toward the unlabeled distribution. By leveraging the discrepancies arising from training on unlabeled data, our method enables robust outlier detection, even when the labeled data are underspecified. Our key contribution is constructing a collection of differently biased models through a single training process. By encouraging divergent heads to be differently biased toward outliers while making consistent predictions for inliers, we exploit the disagreement among these heads as a measure to identify unknown concepts. Extensive experiments demonstrate that our method significantly surpasses state-of-the-art OSSL methods across various protocols.","PeriodicalId":13303,"journal":{"name":"IEEE transactions on neural networks and learning systems","volume":"36 6","pages":"9879-9892"},"PeriodicalIF":8.9000,"publicationDate":"2025-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10944499","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on neural networks and learning systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10944499/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Conventional semi-supervised learning (SSL) ideally assumes that labeled and unlabeled data share an identical class distribution; however, in practice, this assumption is easily violated, as unlabeled data often includes unknown class data, i.e., outliers. The outliers are treated as noise, considerably degrading the performance of SSL models. To address this drawback, we propose a novel framework, diversify and conquer (DAC), to enhance SSL robustness in the context of open-set SSL (OSSL). In particular, we note that existing OSSL methods rely on prediction discrepancies between inliers and outliers from a single model trained on labeled data. This approach can be easily failed when the labeled data are insufficient, leading to performance degradation that is worse than naive SSL that do not account for outliers. In contrast, our approach exploits prediction disagreements among multiple models that are differently biased toward the unlabeled distribution. By leveraging the discrepancies arising from training on unlabeled data, our method enables robust outlier detection, even when the labeled data are underspecified. Our key contribution is constructing a collection of differently biased models through a single training process. By encouraging divergent heads to be differently biased toward outliers while making consistent predictions for inliers, we exploit the disagreement among these heads as a measure to identify unknown concepts. Extensive experiments demonstrate that our method significantly surpasses state-of-the-art OSSL methods across various protocols.

查看原文本刊更多论文

多样化与征服：具有异常值的鲁棒半监督学习的开集分歧

传统的半监督学习（SSL）理想地假设标记和未标记的数据共享相同的类分布；然而，在实践中，这个假设很容易被违背，因为未标记的数据通常包含未知的类数据，即离群值。异常值被视为噪声，这大大降低了SSL模型的性能。为了解决这个缺点，我们提出了一个新的框架，多样化和征服（DAC），以增强开放集SSL （OSSL）环境下的SSL鲁棒性。特别是，我们注意到现有的OSSL方法依赖于在标记数据上训练的单个模型的内线和离群值之间的预测差异。当标记的数据不足时，这种方法很容易失败，导致性能下降，这比没有考虑异常值的原始SSL更糟糕。相比之下，我们的方法利用了多个模型之间的预测分歧，这些模型不同地偏向于未标记的分布。通过利用在未标记数据上训练产生的差异，我们的方法可以实现鲁棒的异常值检测，即使在标记数据未指定的情况下也是如此。我们的主要贡献是通过单个训练过程构建不同偏差模型的集合。通过鼓励不同的头部对异常值有不同的偏见，同时对内线进行一致的预测，我们利用这些头部之间的分歧作为识别未知概念的措施。大量的实验表明，我们的方法在各种协议上明显优于最先进的OSSL方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE transactions on neural networks and learning systems COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

CiteScore

23.80

自引率

9.60%

发文量

2102

审稿时长

3-8 weeks

期刊介绍： The focus of IEEE Transactions on Neural Networks and Learning Systems is to present scholarly articles discussing the theory, design, and applications of neural networks as well as other learning systems. The journal primarily highlights technical and scientific research in this domain.