Unsupervised Projected Sample Selector for Active Learning

IF 7.5 3区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS
Yueyang Pi;Yiqing Shi;Shide Du;Yang Huang;Shiping Wang
{"title":"Unsupervised Projected Sample Selector for Active Learning","authors":"Yueyang Pi;Yiqing Shi;Shide Du;Yang Huang;Shiping Wang","doi":"10.1109/TBDATA.2024.3407545","DOIUrl":null,"url":null,"abstract":"Active learning, as a technique, aims to effectively label specific data points while operating within a designated query budget. Nevertheless, the majority of unsupervised active learning algorithms are based on shallow linear representation and lack sufficient interpretability. Furthermore, certain diversity-based methods face challenges in selecting samples that adequately represent the entire data distribution. Inspired by these reasons, in this paper, we propose an unsupervised active learning method on orthogonal projections to construct a deep neural network model. By optimizing the orthogonal projection process, we establish the connection between projection and active learning, consequently enhancing the interpretability of the proposed method. The proposed method can efficiently project the feature space onto a spanned subspace, deriving an indicator matrix while calculating the projection loss. Moreover, we consider the redundancy among samples to ensure both data point diversity and enhancement of clustering-based algorithms. Through extensive comparative experiments on six public datasets, the results demonstrate that the proposed method can effectively select more informative and representative samples and improve performance by up to 11%.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 2","pages":"485-498"},"PeriodicalIF":7.5000,"publicationDate":"2024-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Big Data","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10542380/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Active learning, as a technique, aims to effectively label specific data points while operating within a designated query budget. Nevertheless, the majority of unsupervised active learning algorithms are based on shallow linear representation and lack sufficient interpretability. Furthermore, certain diversity-based methods face challenges in selecting samples that adequately represent the entire data distribution. Inspired by these reasons, in this paper, we propose an unsupervised active learning method on orthogonal projections to construct a deep neural network model. By optimizing the orthogonal projection process, we establish the connection between projection and active learning, consequently enhancing the interpretability of the proposed method. The proposed method can efficiently project the feature space onto a spanned subspace, deriving an indicator matrix while calculating the projection loss. Moreover, we consider the redundancy among samples to ensure both data point diversity and enhancement of clustering-based algorithms. Through extensive comparative experiments on six public datasets, the results demonstrate that the proposed method can effectively select more informative and representative samples and improve performance by up to 11%.
主动学习的无监督投影样本选择器
主动学习作为一种技术,旨在在指定的查询预算范围内有效地标记特定的数据点。然而,大多数无监督主动学习算法都是基于浅线性表示,缺乏足够的可解释性。此外,某些基于多样性的方法在选择充分代表整个数据分布的样本方面面临挑战。在这些原因的启发下,本文提出了一种基于正交投影的无监督主动学习方法来构建深度神经网络模型。通过优化正交投影过程,我们建立了投影与主动学习之间的联系,从而增强了所提方法的可解释性。该方法可以有效地将特征空间投影到一个张成的子空间上,在计算投影损失的同时得到一个指示矩阵。此外,我们考虑了样本之间的冗余,以确保数据点的多样性和增强基于聚类的算法。通过在6个公共数据集上的大量对比实验,结果表明,该方法可以有效地选择更具信息量和代表性的样本,并将性能提高了11%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
11.80
自引率
2.80%
发文量
114
期刊介绍: The IEEE Transactions on Big Data publishes peer-reviewed articles focusing on big data. These articles present innovative research ideas and application results across disciplines, including novel theories, algorithms, and applications. Research areas cover a wide range, such as big data analytics, visualization, curation, management, semantics, infrastructure, standards, performance analysis, intelligence extraction, scientific discovery, security, privacy, and legal issues specific to big data. The journal also prioritizes applications of big data in fields generating massive datasets.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信