José Antonio Bernabé-Díaz , Manuel Franco , Juana-María Vivo , Jesualdo Tomás Fernández-Breis
{"title":"Optimizing clustering-based analytical methods with trimmed and sparse clustering","authors":"José Antonio Bernabé-Díaz , Manuel Franco , Juana-María Vivo , Jesualdo Tomás Fernández-Breis","doi":"10.1016/j.compbiomed.2025.110436","DOIUrl":null,"url":null,"abstract":"<div><div>Clustering is an essential tool in biomedical research, often used to identify patterns and subgroups within complex, high-dimensional datasets, such as gene expression profiles, metabolomics, and patient stratification data. However, searching the optimal number of clusters and other input parameters such as trimmed and sparse represent challenging tasks. Traditional clustering methods may struggle to handle noisy, outliers, redundancy, and high-dimensional data, which are common in biomedical applications, leading to unreliable or biologically uninterpretable results.</div><div>Sparse clustering methods help by emphasizing significant features while suppressing noise, and trimmed clustering can enhance robustness by excluding outliers. Yet, existing approaches often require manual tuning of parameters, such as the trimming proportion, and the sparsity level, which can be time-consuming and based on a trial-and-error approach.</div><div>To address these limitations, this work presents an automated trimmed and sparse clustering method, which automatically determines both the optimal number of clusters and the necessary tuning parameters. Our method has been made available to the biomedical community through the <em>evaluomeR</em> package, which enables researchers to efficiently implement sophisticated clustering without extensive computational background. This advancement not only increases the usability of trimmed and sparse clustering, but also promotes reproducibility and accuracy in data-driven biomedical discoveries.</div></div>","PeriodicalId":10578,"journal":{"name":"Computers in biology and medicine","volume":"194 ","pages":"Article 110436"},"PeriodicalIF":7.0000,"publicationDate":"2025-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers in biology and medicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0010482525007875","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Clustering is an essential tool in biomedical research, often used to identify patterns and subgroups within complex, high-dimensional datasets, such as gene expression profiles, metabolomics, and patient stratification data. However, searching the optimal number of clusters and other input parameters such as trimmed and sparse represent challenging tasks. Traditional clustering methods may struggle to handle noisy, outliers, redundancy, and high-dimensional data, which are common in biomedical applications, leading to unreliable or biologically uninterpretable results.
Sparse clustering methods help by emphasizing significant features while suppressing noise, and trimmed clustering can enhance robustness by excluding outliers. Yet, existing approaches often require manual tuning of parameters, such as the trimming proportion, and the sparsity level, which can be time-consuming and based on a trial-and-error approach.
To address these limitations, this work presents an automated trimmed and sparse clustering method, which automatically determines both the optimal number of clusters and the necessary tuning parameters. Our method has been made available to the biomedical community through the evaluomeR package, which enables researchers to efficiently implement sophisticated clustering without extensive computational background. This advancement not only increases the usability of trimmed and sparse clustering, but also promotes reproducibility and accuracy in data-driven biomedical discoveries.
期刊介绍:
Computers in Biology and Medicine is an international forum for sharing groundbreaking advancements in the use of computers in bioscience and medicine. This journal serves as a medium for communicating essential research, instruction, ideas, and information regarding the rapidly evolving field of computer applications in these domains. By encouraging the exchange of knowledge, we aim to facilitate progress and innovation in the utilization of computers in biology and medicine.