Yueyang Pi;Yiqing Shi;Shide Du;Yang Huang;Shiping Wang
{"title":"Unsupervised Projected Sample Selector for Active Learning","authors":"Yueyang Pi;Yiqing Shi;Shide Du;Yang Huang;Shiping Wang","doi":"10.1109/TBDATA.2024.3407545","DOIUrl":null,"url":null,"abstract":"Active learning, as a technique, aims to effectively label specific data points while operating within a designated query budget. Nevertheless, the majority of unsupervised active learning algorithms are based on shallow linear representation and lack sufficient interpretability. Furthermore, certain diversity-based methods face challenges in selecting samples that adequately represent the entire data distribution. Inspired by these reasons, in this paper, we propose an unsupervised active learning method on orthogonal projections to construct a deep neural network model. By optimizing the orthogonal projection process, we establish the connection between projection and active learning, consequently enhancing the interpretability of the proposed method. The proposed method can efficiently project the feature space onto a spanned subspace, deriving an indicator matrix while calculating the projection loss. Moreover, we consider the redundancy among samples to ensure both data point diversity and enhancement of clustering-based algorithms. Through extensive comparative experiments on six public datasets, the results demonstrate that the proposed method can effectively select more informative and representative samples and improve performance by up to 11%.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 2","pages":"485-498"},"PeriodicalIF":7.5000,"publicationDate":"2024-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Big Data","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10542380/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Active learning, as a technique, aims to effectively label specific data points while operating within a designated query budget. Nevertheless, the majority of unsupervised active learning algorithms are based on shallow linear representation and lack sufficient interpretability. Furthermore, certain diversity-based methods face challenges in selecting samples that adequately represent the entire data distribution. Inspired by these reasons, in this paper, we propose an unsupervised active learning method on orthogonal projections to construct a deep neural network model. By optimizing the orthogonal projection process, we establish the connection between projection and active learning, consequently enhancing the interpretability of the proposed method. The proposed method can efficiently project the feature space onto a spanned subspace, deriving an indicator matrix while calculating the projection loss. Moreover, we consider the redundancy among samples to ensure both data point diversity and enhancement of clustering-based algorithms. Through extensive comparative experiments on six public datasets, the results demonstrate that the proposed method can effectively select more informative and representative samples and improve performance by up to 11%.
期刊介绍:
The IEEE Transactions on Big Data publishes peer-reviewed articles focusing on big data. These articles present innovative research ideas and application results across disciplines, including novel theories, algorithms, and applications. Research areas cover a wide range, such as big data analytics, visualization, curation, management, semantics, infrastructure, standards, performance analysis, intelligence extraction, scientific discovery, security, privacy, and legal issues specific to big data. The journal also prioritizes applications of big data in fields generating massive datasets.