Omar Zamzam, Haleh Akrami, Mahdi Soltanolkotabi, Richard Leahy
{"title":"Learning From PU Data Using Disentangled Representations.","authors":"Omar Zamzam, Haleh Akrami, Mahdi Soltanolkotabi, Richard Leahy","doi":"10.1109/icip55913.2025.11084723","DOIUrl":null,"url":null,"abstract":"<p><p>We address the problem of learning a binary classifier given partially labeled data where all labeled samples come from only one of the classes, commonly known as Positive Unlabeled (PU) learning. Classical methods such as clustering, out-of-distribution detection, and positive density estimation, while effective in low-dimensional scenarios, lose their efficacy as the dimensionality of data increases, because of the increasing complexity. This has led to the development of methods that address the problem in high-dimensional spaces; however, many of these methods are also impacted by the increased complexity inherent in high-dimensional data. The contribution of this paper is the learning of a neural network-based data representation by employing a loss function that enables the projection of unlabeled data into two distinct clusters - positive and negative - facilitating their identification through basic clustering techniques and mirroring the simplicity of the problem seen in low-dimensional settings. We further enhance this separation of unlabeled data clusters by implementing a vector quantization strategy. Our experimental results on benchmarking PU datasets validate the superiority of our method over existing state-of-the-art techniques. Additionally, we provide theoretical justification to support our cluster-based approach and algorithmic choices.</p>","PeriodicalId":74572,"journal":{"name":"Proceedings. International Conference on Image Processing","volume":"2025 ","pages":"1624-1629"},"PeriodicalIF":0.0000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12503129/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. International Conference on Image Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/icip55913.2025.11084723","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/8/18 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
We address the problem of learning a binary classifier given partially labeled data where all labeled samples come from only one of the classes, commonly known as Positive Unlabeled (PU) learning. Classical methods such as clustering, out-of-distribution detection, and positive density estimation, while effective in low-dimensional scenarios, lose their efficacy as the dimensionality of data increases, because of the increasing complexity. This has led to the development of methods that address the problem in high-dimensional spaces; however, many of these methods are also impacted by the increased complexity inherent in high-dimensional data. The contribution of this paper is the learning of a neural network-based data representation by employing a loss function that enables the projection of unlabeled data into two distinct clusters - positive and negative - facilitating their identification through basic clustering techniques and mirroring the simplicity of the problem seen in low-dimensional settings. We further enhance this separation of unlabeled data clusters by implementing a vector quantization strategy. Our experimental results on benchmarking PU datasets validate the superiority of our method over existing state-of-the-art techniques. Additionally, we provide theoretical justification to support our cluster-based approach and algorithmic choices.