Unsupervised Projected Sample Selector for Active Learning

IF 7.5 3区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Big Data Pub Date : 2024-03-30 DOI:10.1109/TBDATA.2024.3407545

Yueyang Pi;Yiqing Shi;Shide Du;Yang Huang;Shiping Wang

{"title":"Unsupervised Projected Sample Selector for Active Learning","authors":"Yueyang Pi;Yiqing Shi;Shide Du;Yang Huang;Shiping Wang","doi":"10.1109/TBDATA.2024.3407545","DOIUrl":null,"url":null,"abstract":"Active learning, as a technique, aims to effectively label specific data points while operating within a designated query budget. Nevertheless, the majority of unsupervised active learning algorithms are based on shallow linear representation and lack sufficient interpretability. Furthermore, certain diversity-based methods face challenges in selecting samples that adequately represent the entire data distribution. Inspired by these reasons, in this paper, we propose an unsupervised active learning method on orthogonal projections to construct a deep neural network model. By optimizing the orthogonal projection process, we establish the connection between projection and active learning, consequently enhancing the interpretability of the proposed method. The proposed method can efficiently project the feature space onto a spanned subspace, deriving an indicator matrix while calculating the projection loss. Moreover, we consider the redundancy among samples to ensure both data point diversity and enhancement of clustering-based algorithms. Through extensive comparative experiments on six public datasets, the results demonstrate that the proposed method can effectively select more informative and representative samples and improve performance by up to 11%.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 2","pages":"485-498"},"PeriodicalIF":7.5000,"publicationDate":"2024-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Big Data","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10542380/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Active learning, as a technique, aims to effectively label specific data points while operating within a designated query budget. Nevertheless, the majority of unsupervised active learning algorithms are based on shallow linear representation and lack sufficient interpretability. Furthermore, certain diversity-based methods face challenges in selecting samples that adequately represent the entire data distribution. Inspired by these reasons, in this paper, we propose an unsupervised active learning method on orthogonal projections to construct a deep neural network model. By optimizing the orthogonal projection process, we establish the connection between projection and active learning, consequently enhancing the interpretability of the proposed method. The proposed method can efficiently project the feature space onto a spanned subspace, deriving an indicator matrix while calculating the projection loss. Moreover, we consider the redundancy among samples to ensure both data point diversity and enhancement of clustering-based algorithms. Through extensive comparative experiments on six public datasets, the results demonstrate that the proposed method can effectively select more informative and representative samples and improve performance by up to 11%.

查看原文本刊更多论文

主动学习的无监督投影样本选择器

主动学习作为一种技术，旨在在指定的查询预算范围内有效地标记特定的数据点。然而，大多数无监督主动学习算法都是基于浅线性表示，缺乏足够的可解释性。此外，某些基于多样性的方法在选择充分代表整个数据分布的样本方面面临挑战。在这些原因的启发下，本文提出了一种基于正交投影的无监督主动学习方法来构建深度神经网络模型。通过优化正交投影过程，我们建立了投影与主动学习之间的联系，从而增强了所提方法的可解释性。该方法可以有效地将特征空间投影到一个张成的子空间上，在计算投影损失的同时得到一个指示矩阵。此外，我们考虑了样本之间的冗余，以确保数据点的多样性和增强基于聚类的算法。通过在6个公共数据集上的大量对比实验，结果表明，该方法可以有效地选择更具信息量和代表性的样本，并将性能提高了11%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Big Data Multiple-

CiteScore

11.80

自引率

2.80%

发文量

114

期刊介绍： The IEEE Transactions on Big Data publishes peer-reviewed articles focusing on big data. These articles present innovative research ideas and application results across disciplines, including novel theories, algorithms, and applications. Research areas cover a wide range, such as big data analytics, visualization, curation, management, semantics, infrastructure, standards, performance analysis, intelligence extraction, scientific discovery, security, privacy, and legal issues specific to big data. The journal also prioritizes applications of big data in fields generating massive datasets.