Optimizing clustering-based analytical methods with trimmed and sparse clustering

IF 7 2区医学 Q1 BIOLOGY

Computers in biology and medicine Pub Date : 2025-06-16 DOI:10.1016/j.compbiomed.2025.110436

José Antonio Bernabé-Díaz , Manuel Franco , Juana-María Vivo , Jesualdo Tomás Fernández-Breis

{"title":"Optimizing clustering-based analytical methods with trimmed and sparse clustering","authors":"José Antonio Bernabé-Díaz , Manuel Franco , Juana-María Vivo , Jesualdo Tomás Fernández-Breis","doi":"10.1016/j.compbiomed.2025.110436","DOIUrl":null,"url":null,"abstract":"<div><div>Clustering is an essential tool in biomedical research, often used to identify patterns and subgroups within complex, high-dimensional datasets, such as gene expression profiles, metabolomics, and patient stratification data. However, searching the optimal number of clusters and other input parameters such as trimmed and sparse represent challenging tasks. Traditional clustering methods may struggle to handle noisy, outliers, redundancy, and high-dimensional data, which are common in biomedical applications, leading to unreliable or biologically uninterpretable results.</div><div>Sparse clustering methods help by emphasizing significant features while suppressing noise, and trimmed clustering can enhance robustness by excluding outliers. Yet, existing approaches often require manual tuning of parameters, such as the trimming proportion, and the sparsity level, which can be time-consuming and based on a trial-and-error approach.</div><div>To address these limitations, this work presents an automated trimmed and sparse clustering method, which automatically determines both the optimal number of clusters and the necessary tuning parameters. Our method has been made available to the biomedical community through the <em>evaluomeR</em> package, which enables researchers to efficiently implement sophisticated clustering without extensive computational background. This advancement not only increases the usability of trimmed and sparse clustering, but also promotes reproducibility and accuracy in data-driven biomedical discoveries.</div></div>","PeriodicalId":10578,"journal":{"name":"Computers in biology and medicine","volume":"194 ","pages":"Article 110436"},"PeriodicalIF":7.0000,"publicationDate":"2025-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers in biology and medicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0010482525007875","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Clustering is an essential tool in biomedical research, often used to identify patterns and subgroups within complex, high-dimensional datasets, such as gene expression profiles, metabolomics, and patient stratification data. However, searching the optimal number of clusters and other input parameters such as trimmed and sparse represent challenging tasks. Traditional clustering methods may struggle to handle noisy, outliers, redundancy, and high-dimensional data, which are common in biomedical applications, leading to unreliable or biologically uninterpretable results.

Sparse clustering methods help by emphasizing significant features while suppressing noise, and trimmed clustering can enhance robustness by excluding outliers. Yet, existing approaches often require manual tuning of parameters, such as the trimming proportion, and the sparsity level, which can be time-consuming and based on a trial-and-error approach.

To address these limitations, this work presents an automated trimmed and sparse clustering method, which automatically determines both the optimal number of clusters and the necessary tuning parameters. Our method has been made available to the biomedical community through the evaluomeR package, which enables researchers to efficiently implement sophisticated clustering without extensive computational background. This advancement not only increases the usability of trimmed and sparse clustering, but also promotes reproducibility and accuracy in data-driven biomedical discoveries.

查看原文本刊更多论文

基于裁剪和稀疏聚类的聚类分析方法优化

聚类是生物医学研究中的重要工具，通常用于识别复杂的高维数据集中的模式和亚组，如基因表达谱、代谢组学和患者分层数据。然而，搜索最优簇数和其他输入参数（如修剪和稀疏）是具有挑战性的任务。传统的聚类方法可能难以处理生物医学应用中常见的噪声、异常值、冗余和高维数据，从而导致不可靠或生物学上不可解释的结果。稀疏聚类方法在强调重要特征的同时抑制噪声，而裁剪聚类方法可以通过排除异常值来增强鲁棒性。然而，现有的方法通常需要手动调整参数，例如修整比例和稀疏度级别，这可能非常耗时，并且基于试错方法。为了解决这些限制，本工作提出了一种自动修剪和稀疏聚类方法，该方法自动确定最优聚类数量和必要的调优参数。我们的方法已经通过evaluomeR包提供给生物医学界，它使研究人员能够在没有广泛计算背景的情况下有效地实现复杂的聚类。这一进步不仅提高了裁剪和稀疏聚类的可用性，而且提高了数据驱动的生物医学发现的可重复性和准确性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computers in biology and medicine 工程技术-工程：生物医学

CiteScore

11.70

自引率

10.40%

发文量

1086

审稿时长

74 days

期刊介绍： Computers in Biology and Medicine is an international forum for sharing groundbreaking advancements in the use of computers in bioscience and medicine. This journal serves as a medium for communicating essential research, instruction, ideas, and information regarding the rapidly evolving field of computer applications in these domains. By encouraging the exchange of knowledge, we aim to facilitate progress and innovation in the utilization of computers in biology and medicine.