使用解纠缠表示从PU数据中学习。

Omar Zamzam, Haleh Akrami, Mahdi Soltanolkotabi, Richard Leahy
{"title":"使用解纠缠表示从PU数据中学习。","authors":"Omar Zamzam, Haleh Akrami, Mahdi Soltanolkotabi, Richard Leahy","doi":"10.1109/icip55913.2025.11084723","DOIUrl":null,"url":null,"abstract":"<p><p>We address the problem of learning a binary classifier given partially labeled data where all labeled samples come from only one of the classes, commonly known as Positive Unlabeled (PU) learning. Classical methods such as clustering, out-of-distribution detection, and positive density estimation, while effective in low-dimensional scenarios, lose their efficacy as the dimensionality of data increases, because of the increasing complexity. This has led to the development of methods that address the problem in high-dimensional spaces; however, many of these methods are also impacted by the increased complexity inherent in high-dimensional data. The contribution of this paper is the learning of a neural network-based data representation by employing a loss function that enables the projection of unlabeled data into two distinct clusters - positive and negative - facilitating their identification through basic clustering techniques and mirroring the simplicity of the problem seen in low-dimensional settings. We further enhance this separation of unlabeled data clusters by implementing a vector quantization strategy. Our experimental results on benchmarking PU datasets validate the superiority of our method over existing state-of-the-art techniques. Additionally, we provide theoretical justification to support our cluster-based approach and algorithmic choices.</p>","PeriodicalId":74572,"journal":{"name":"Proceedings. International Conference on Image Processing","volume":"2025 ","pages":"1624-1629"},"PeriodicalIF":0.0000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12503129/pdf/","citationCount":"0","resultStr":"{\"title\":\"Learning From PU Data Using Disentangled Representations.\",\"authors\":\"Omar Zamzam, Haleh Akrami, Mahdi Soltanolkotabi, Richard Leahy\",\"doi\":\"10.1109/icip55913.2025.11084723\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>We address the problem of learning a binary classifier given partially labeled data where all labeled samples come from only one of the classes, commonly known as Positive Unlabeled (PU) learning. Classical methods such as clustering, out-of-distribution detection, and positive density estimation, while effective in low-dimensional scenarios, lose their efficacy as the dimensionality of data increases, because of the increasing complexity. This has led to the development of methods that address the problem in high-dimensional spaces; however, many of these methods are also impacted by the increased complexity inherent in high-dimensional data. The contribution of this paper is the learning of a neural network-based data representation by employing a loss function that enables the projection of unlabeled data into two distinct clusters - positive and negative - facilitating their identification through basic clustering techniques and mirroring the simplicity of the problem seen in low-dimensional settings. We further enhance this separation of unlabeled data clusters by implementing a vector quantization strategy. Our experimental results on benchmarking PU datasets validate the superiority of our method over existing state-of-the-art techniques. Additionally, we provide theoretical justification to support our cluster-based approach and algorithmic choices.</p>\",\"PeriodicalId\":74572,\"journal\":{\"name\":\"Proceedings. International Conference on Image Processing\",\"volume\":\"2025 \",\"pages\":\"1624-1629\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12503129/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings. International Conference on Image Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/icip55913.2025.11084723\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/8/18 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. International Conference on Image Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/icip55913.2025.11084723","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/8/18 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

我们解决了给定部分标记数据的二分类器学习问题,其中所有标记样本仅来自一个类,通常称为正未标记(PU)学习。聚类、分布外检测和正密度估计等经典方法虽然在低维场景下是有效的,但随着数据维数的增加,由于复杂性的增加,它们的有效性会逐渐降低。这导致了解决高维空间问题的方法的发展;然而,这些方法中的许多也受到高维数据固有的复杂性增加的影响。本文的贡献是通过使用损失函数来学习基于神经网络的数据表示,该损失函数能够将未标记的数据投影到两个不同的聚类中——正的和负的——通过基本的聚类技术促进它们的识别,并反映了低维设置中看到的问题的简单性。我们通过实现矢量量化策略进一步增强了这种未标记数据簇的分离。我们在基准测试PU数据集上的实验结果验证了我们的方法优于现有的最先进的技术。此外,我们还提供了理论依据来支持我们基于集群的方法和算法选择。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Learning From PU Data Using Disentangled Representations.

We address the problem of learning a binary classifier given partially labeled data where all labeled samples come from only one of the classes, commonly known as Positive Unlabeled (PU) learning. Classical methods such as clustering, out-of-distribution detection, and positive density estimation, while effective in low-dimensional scenarios, lose their efficacy as the dimensionality of data increases, because of the increasing complexity. This has led to the development of methods that address the problem in high-dimensional spaces; however, many of these methods are also impacted by the increased complexity inherent in high-dimensional data. The contribution of this paper is the learning of a neural network-based data representation by employing a loss function that enables the projection of unlabeled data into two distinct clusters - positive and negative - facilitating their identification through basic clustering techniques and mirroring the simplicity of the problem seen in low-dimensional settings. We further enhance this separation of unlabeled data clusters by implementing a vector quantization strategy. Our experimental results on benchmarking PU datasets validate the superiority of our method over existing state-of-the-art techniques. Additionally, we provide theoretical justification to support our cluster-based approach and algorithmic choices.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信