Gram–Schmidt Methods for Unsupervised Feature Extraction and Selection

IF 2.9 3区计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Information Theory Pub Date : 2025-07-17 DOI:10.1109/TIT.2025.3589174

Bahram Yaghooti;Netanel Raviv;Bruno Sinopoli

{"title":"Gram–Schmidt Methods for Unsupervised Feature Extraction and Selection","authors":"Bahram Yaghooti;Netanel Raviv;Bruno Sinopoli","doi":"10.1109/TIT.2025.3589174","DOIUrl":null,"url":null,"abstract":"Feature extraction and selection in the presence of nonlinear dependencies among the data is a fundamental challenge in unsupervised learning. We propose using a Gram-Schmidt (GS) type orthogonalization process over function spaces to detect and map out such dependencies. Specifically, by applying the GS process over some family of functions, we construct a series of covariance matrices that can either be used to identify new large-variance directions, or to remove those dependencies from known directions. In the former case, we provide information-theoretic guarantees in terms of entropy reduction. In the latter, we provide precise conditions by which the chosen function family eliminates existing redundancy in the data. Each approach provides both a feature extraction and a feature selection algorithm. Our feature extraction methods are linear, and can be seen as natural generalization of principal component analysis (PCA). We provide experimental results for synthetic and real-world benchmark datasets which show superior performance over state-of-the-art (linear) feature extraction and selection algorithms. Surprisingly, our linear feature extraction algorithms are comparable and often outperform several important nonlinear feature extraction methods such as autoencoders, kernel PCA, and UMAP. Furthermore, one of our feature selection algorithms strictly generalizes a recent Fourier-based feature selection mechanism (Heidari et al., IEEE Transactions on Information Theory, 2022), yet at significantly reduced complexity.","PeriodicalId":13494,"journal":{"name":"IEEE Transactions on Information Theory","volume":"71 10","pages":"7856-7885"},"PeriodicalIF":2.9000,"publicationDate":"2025-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Information Theory","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11083623/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Feature extraction and selection in the presence of nonlinear dependencies among the data is a fundamental challenge in unsupervised learning. We propose using a Gram-Schmidt (GS) type orthogonalization process over function spaces to detect and map out such dependencies. Specifically, by applying the GS process over some family of functions, we construct a series of covariance matrices that can either be used to identify new large-variance directions, or to remove those dependencies from known directions. In the former case, we provide information-theoretic guarantees in terms of entropy reduction. In the latter, we provide precise conditions by which the chosen function family eliminates existing redundancy in the data. Each approach provides both a feature extraction and a feature selection algorithm. Our feature extraction methods are linear, and can be seen as natural generalization of principal component analysis (PCA). We provide experimental results for synthetic and real-world benchmark datasets which show superior performance over state-of-the-art (linear) feature extraction and selection algorithms. Surprisingly, our linear feature extraction algorithms are comparable and often outperform several important nonlinear feature extraction methods such as autoencoders, kernel PCA, and UMAP. Furthermore, one of our feature selection algorithms strictly generalizes a recent Fourier-based feature selection mechanism (Heidari et al., IEEE Transactions on Information Theory, 2022), yet at significantly reduced complexity.

查看原文本刊更多论文

无监督特征提取和选择的Gram-Schmidt方法

在数据之间存在非线性依赖的情况下，特征提取和选择是无监督学习的一个基本挑战。我们建议在函数空间上使用Gram-Schmidt （GS）型正交化过程来检测和映射这种依赖关系。具体来说，通过在一些函数族上应用GS过程，我们构建了一系列协方差矩阵，这些协方差矩阵可以用来识别新的大方差方向，或者从已知方向中去除这些依赖关系。在前一种情况下，我们从熵减少的角度提供信息理论保证。在后者中，我们提供了所选函数族消除数据中现有冗余的精确条件。每种方法都提供了特征提取和特征选择算法。我们的特征提取方法是线性的，可以看作是主成分分析（PCA）的自然推广。我们提供了合成和现实世界基准数据集的实验结果，这些数据集显示出优于最先进（线性）特征提取和选择算法的性能。令人惊讶的是，我们的线性特征提取算法具有可比性，并且通常优于几种重要的非线性特征提取方法，如自编码器、核PCA和UMAP。此外，我们的特征选择算法之一严格推广了最近基于傅里叶的特征选择机制（Heidari等人，IEEE Transactions on Information Theory, 2022），但显著降低了复杂性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Information Theory 工程技术-工程：电子与电气

CiteScore

5.70

自引率

20.00%

发文量

514

审稿时长

12 months

期刊介绍： The IEEE Transactions on Information Theory is a journal that publishes theoretical and experimental papers concerned with the transmission, processing, and utilization of information. The boundaries of acceptable subject matter are intentionally not sharply delimited. Rather, it is hoped that as the focus of research activity changes, a flexible policy will permit this Transactions to follow suit. Current appropriate topics are best reflected by recent Tables of Contents; they are summarized in the titles of editorial areas that appear on the inside front cover.