Yu-Feng Yu , Jiali Luo , Xuanyi Chen , Yingchao Cheng , Yulin He , Joshua Zhexue Huang
{"title":"Discriminative local affine-hull clustering for high-dimensional data","authors":"Yu-Feng Yu , Jiali Luo , Xuanyi Chen , Yingchao Cheng , Yulin He , Joshua Zhexue Huang","doi":"10.1016/j.array.2025.100465","DOIUrl":null,"url":null,"abstract":"<div><div>Clustering high-dimensional data presents a critical technical challenge due to the curse of dimensionality, feature redundancy, and sensitivity to noise—issues that significantly degrade clustering accuracy in applications such as gene expression analysis, image recognition, and anomaly detection. Existing solutions often rely on dimensionality reduction techniques that risk discarding discriminative features, or on deep learning methods that require large-scale training data and suffer from poor interpretability. To address these limitations, this study proposes a novel discriminative subspace clustering algorithm that avoids traditional dimensionality reduction and instead operates directly in the high-dimensional space. Our method partitions the sample space into multiple local affine hulls and introduces a discriminative geometric distance metric that accounts for both relevant and irrelevant subspaces. Specifically, the model measures the ratio between a query sample’s proximity to its class-specific affine hull and its distance from unrelated class subspaces. This dual-space modeling improves both intra-class compactness and inter-class separation. To ensure computational efficiency, we reformulate distance calculations as matrix multiplications and leverage SVD for subspace projection, enabling scalable performance across large datasets. Extensive experiments on seven benchmark datasets demonstrate that the proposed method consistently outperforms state-of-the-art clustering algorithms. It achieves up to 92.60% accuracy on MNIST and maintains high robustness on sparse and noisy data, validating its effectiveness for high-dimensional clustering tasks. This work contributes a geometrically interpretable and computationally efficient framework that closes a long-standing gap in unsupervised learning under high-dimensional constraints.</div></div>","PeriodicalId":8417,"journal":{"name":"Array","volume":"27 ","pages":"Article 100465"},"PeriodicalIF":4.5000,"publicationDate":"2025-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Array","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S259000562500092X","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
Clustering high-dimensional data presents a critical technical challenge due to the curse of dimensionality, feature redundancy, and sensitivity to noise—issues that significantly degrade clustering accuracy in applications such as gene expression analysis, image recognition, and anomaly detection. Existing solutions often rely on dimensionality reduction techniques that risk discarding discriminative features, or on deep learning methods that require large-scale training data and suffer from poor interpretability. To address these limitations, this study proposes a novel discriminative subspace clustering algorithm that avoids traditional dimensionality reduction and instead operates directly in the high-dimensional space. Our method partitions the sample space into multiple local affine hulls and introduces a discriminative geometric distance metric that accounts for both relevant and irrelevant subspaces. Specifically, the model measures the ratio between a query sample’s proximity to its class-specific affine hull and its distance from unrelated class subspaces. This dual-space modeling improves both intra-class compactness and inter-class separation. To ensure computational efficiency, we reformulate distance calculations as matrix multiplications and leverage SVD for subspace projection, enabling scalable performance across large datasets. Extensive experiments on seven benchmark datasets demonstrate that the proposed method consistently outperforms state-of-the-art clustering algorithms. It achieves up to 92.60% accuracy on MNIST and maintains high robustness on sparse and noisy data, validating its effectiveness for high-dimensional clustering tasks. This work contributes a geometrically interpretable and computationally efficient framework that closes a long-standing gap in unsupervised learning under high-dimensional constraints.