{"title":"Self-Labeling and Self-Knowledge Distillation Unsupervised Feature Selection","authors":"Yunzhi Ling;Feiping Nie;Weizhong Yu;Xuelong Li","doi":"10.1109/TKDE.2025.3561046","DOIUrl":null,"url":null,"abstract":"This paper proposes a deep pseudo-label method for unsupervised feature selection, which learns non-linear representations to generate pseudo-labels and trains a Neural Network (NN) to select informative features via self-Knowledge Distillation (KD). Specifically, the proposed method divides a standard NN into two sub-components: an encoder and a predictor, and introduces a dependency subnet. It works by self-supervised pre-training the encoder to produce informative representations and then alternating between two steps: (1) learning pseudo-labels by combining the clustering results of the encoder's outputs with the NN's prediction outputs, and (2) updating the NN's parameters by globally selecting a subset of features to predict the pseudo-labels while updating the subnet's parameters through self-KD. Self-KD is achieved by encouraging the subnet to locally capture a subset of the NN features to produce class probabilities that match those produced by the NN. This allows the model to self-absorb the learned inter-class knowledge and evaluate feature diversity, removing redundant features without sacrificing performance. Meanwhile, the potential discriminative capability of a NN can also be self-excavated without the assistance of other NNs. The two alternate steps reinforce each other: in step (2), by predicting the learned pseudo-labels and conducting self-KD, the discrimination of the outputs of both the NN and the encoder is gradually enhanced, while the self-labeling method in step (1) leverages these two improvements to further refine the pseudo-labels for step (2), resulting in the superior performance. Extensive experiments show the proposed method significantly outperforms state-of-the-art methods across various datasets.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 7","pages":"4270-4284"},"PeriodicalIF":8.9000,"publicationDate":"2025-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Knowledge and Data Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10972142/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
This paper proposes a deep pseudo-label method for unsupervised feature selection, which learns non-linear representations to generate pseudo-labels and trains a Neural Network (NN) to select informative features via self-Knowledge Distillation (KD). Specifically, the proposed method divides a standard NN into two sub-components: an encoder and a predictor, and introduces a dependency subnet. It works by self-supervised pre-training the encoder to produce informative representations and then alternating between two steps: (1) learning pseudo-labels by combining the clustering results of the encoder's outputs with the NN's prediction outputs, and (2) updating the NN's parameters by globally selecting a subset of features to predict the pseudo-labels while updating the subnet's parameters through self-KD. Self-KD is achieved by encouraging the subnet to locally capture a subset of the NN features to produce class probabilities that match those produced by the NN. This allows the model to self-absorb the learned inter-class knowledge and evaluate feature diversity, removing redundant features without sacrificing performance. Meanwhile, the potential discriminative capability of a NN can also be self-excavated without the assistance of other NNs. The two alternate steps reinforce each other: in step (2), by predicting the learned pseudo-labels and conducting self-KD, the discrimination of the outputs of both the NN and the encoder is gradually enhanced, while the self-labeling method in step (1) leverages these two improvements to further refine the pseudo-labels for step (2), resulting in the superior performance. Extensive experiments show the proposed method significantly outperforms state-of-the-art methods across various datasets.
期刊介绍:
The IEEE Transactions on Knowledge and Data Engineering encompasses knowledge and data engineering aspects within computer science, artificial intelligence, electrical engineering, computer engineering, and related fields. It provides an interdisciplinary platform for disseminating new developments in knowledge and data engineering and explores the practicality of these concepts in both hardware and software. Specific areas covered include knowledge-based and expert systems, AI techniques for knowledge and data management, tools, and methodologies, distributed processing, real-time systems, architectures, data management practices, database design, query languages, security, fault tolerance, statistical databases, algorithms, performance evaluation, and applications.