Learning From PU Data Using Disentangled Representations.

Proceedings. International Conference on Image Processing Pub Date : 2025-09-01 Epub Date: 2025-08-18 DOI:10.1109/icip55913.2025.11084723

Omar Zamzam, Haleh Akrami, Mahdi Soltanolkotabi, Richard Leahy

{"title":"Learning From PU Data Using Disentangled Representations.","authors":"Omar Zamzam, Haleh Akrami, Mahdi Soltanolkotabi, Richard Leahy","doi":"10.1109/icip55913.2025.11084723","DOIUrl":null,"url":null,"abstract":"<p><p>We address the problem of learning a binary classifier given partially labeled data where all labeled samples come from only one of the classes, commonly known as Positive Unlabeled (PU) learning. Classical methods such as clustering, out-of-distribution detection, and positive density estimation, while effective in low-dimensional scenarios, lose their efficacy as the dimensionality of data increases, because of the increasing complexity. This has led to the development of methods that address the problem in high-dimensional spaces; however, many of these methods are also impacted by the increased complexity inherent in high-dimensional data. The contribution of this paper is the learning of a neural network-based data representation by employing a loss function that enables the projection of unlabeled data into two distinct clusters - positive and negative - facilitating their identification through basic clustering techniques and mirroring the simplicity of the problem seen in low-dimensional settings. We further enhance this separation of unlabeled data clusters by implementing a vector quantization strategy. Our experimental results on benchmarking PU datasets validate the superiority of our method over existing state-of-the-art techniques. Additionally, we provide theoretical justification to support our cluster-based approach and algorithmic choices.</p>","PeriodicalId":74572,"journal":{"name":"Proceedings. International Conference on Image Processing","volume":"2025 ","pages":"1624-1629"},"PeriodicalIF":0.0000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12503129/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. International Conference on Image Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/icip55913.2025.11084723","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/8/18 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

We address the problem of learning a binary classifier given partially labeled data where all labeled samples come from only one of the classes, commonly known as Positive Unlabeled (PU) learning. Classical methods such as clustering, out-of-distribution detection, and positive density estimation, while effective in low-dimensional scenarios, lose their efficacy as the dimensionality of data increases, because of the increasing complexity. This has led to the development of methods that address the problem in high-dimensional spaces; however, many of these methods are also impacted by the increased complexity inherent in high-dimensional data. The contribution of this paper is the learning of a neural network-based data representation by employing a loss function that enables the projection of unlabeled data into two distinct clusters - positive and negative - facilitating their identification through basic clustering techniques and mirroring the simplicity of the problem seen in low-dimensional settings. We further enhance this separation of unlabeled data clusters by implementing a vector quantization strategy. Our experimental results on benchmarking PU datasets validate the superiority of our method over existing state-of-the-art techniques. Additionally, we provide theoretical justification to support our cluster-based approach and algorithmic choices.

查看原文本刊更多论文

使用解纠缠表示从PU数据中学习。

我们解决了给定部分标记数据的二分类器学习问题，其中所有标记样本仅来自一个类，通常称为正未标记（PU）学习。聚类、分布外检测和正密度估计等经典方法虽然在低维场景下是有效的，但随着数据维数的增加，由于复杂性的增加，它们的有效性会逐渐降低。这导致了解决高维空间问题的方法的发展；然而，这些方法中的许多也受到高维数据固有的复杂性增加的影响。本文的贡献是通过使用损失函数来学习基于神经网络的数据表示，该损失函数能够将未标记的数据投影到两个不同的聚类中——正的和负的——通过基本的聚类技术促进它们的识别，并反映了低维设置中看到的问题的简单性。我们通过实现矢量量化策略进一步增强了这种未标记数据簇的分离。我们在基准测试PU数据集上的实验结果验证了我们的方法优于现有的最先进的技术。此外，我们还提供了理论依据来支持我们基于集群的方法和算法选择。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings. International Conference on Image Processing

自引率

0.00%

发文量