{"title":"Pseudo labels approach to interpretable self-guided subspace clustering","authors":"Ivica Kopriva","doi":"10.1016/j.patcog.2025.112618","DOIUrl":null,"url":null,"abstract":"<div><div>Majority subspace clustering (SC) algorithms depend on one or more hyperparameters that need to be tuned for the SC algorithms to achieve high clustering performance. This is often performed using grid-search, assuming that held out set is available. In some domains, such as medicine, this assumption does not hold true in many cases. To address this problem, we propose an approach to label-independent hyperparameter optimization by applying the SC algorithm to the data and use the resulting cluster assignments as pseudo-labels to compute clustering quality metrics (e.g., accuracy (ACC) or normalized mutual information (NMI)) across a predefined hyperparameter grid. Assuming that ACC (or NMI) is a smooth function of hyperparameter values, it is possible to select subintervals of hyperparameters, which are then iteratively further split into halves or thirds until a relative error criterion is satisfied. In principle, the hyperparameters of any SC algorithm can be tuned using the proposed method. We demonstrate this approach on five single-view SC algorithms and two multi-view SC algorithms, comparing the achieved performance with their oracle versions across six datasets for single-view algorithms and three datasets for multi-view algorithms. The proposed method typically achieves clustering performance that is up to 7 % lower than that of the oracle versions. We also enhance the interpretability of the proposed method by visualizing subspace bases, estimated from the computed clustering partitions. This aids in the initial selection of the hyperparameter search space.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112618"},"PeriodicalIF":7.6000,"publicationDate":"2025-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0031320325012816","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Majority subspace clustering (SC) algorithms depend on one or more hyperparameters that need to be tuned for the SC algorithms to achieve high clustering performance. This is often performed using grid-search, assuming that held out set is available. In some domains, such as medicine, this assumption does not hold true in many cases. To address this problem, we propose an approach to label-independent hyperparameter optimization by applying the SC algorithm to the data and use the resulting cluster assignments as pseudo-labels to compute clustering quality metrics (e.g., accuracy (ACC) or normalized mutual information (NMI)) across a predefined hyperparameter grid. Assuming that ACC (or NMI) is a smooth function of hyperparameter values, it is possible to select subintervals of hyperparameters, which are then iteratively further split into halves or thirds until a relative error criterion is satisfied. In principle, the hyperparameters of any SC algorithm can be tuned using the proposed method. We demonstrate this approach on five single-view SC algorithms and two multi-view SC algorithms, comparing the achieved performance with their oracle versions across six datasets for single-view algorithms and three datasets for multi-view algorithms. The proposed method typically achieves clustering performance that is up to 7 % lower than that of the oracle versions. We also enhance the interpretability of the proposed method by visualizing subspace bases, estimated from the computed clustering partitions. This aids in the initial selection of the hyperparameter search space.
期刊介绍:
The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.