Optimizing clustering-based analytical methods with trimmed and sparse clustering

IF 7 2区 医学 Q1 BIOLOGY
José Antonio Bernabé-Díaz , Manuel Franco , Juana-María Vivo , Jesualdo Tomás Fernández-Breis
{"title":"Optimizing clustering-based analytical methods with trimmed and sparse clustering","authors":"José Antonio Bernabé-Díaz ,&nbsp;Manuel Franco ,&nbsp;Juana-María Vivo ,&nbsp;Jesualdo Tomás Fernández-Breis","doi":"10.1016/j.compbiomed.2025.110436","DOIUrl":null,"url":null,"abstract":"<div><div>Clustering is an essential tool in biomedical research, often used to identify patterns and subgroups within complex, high-dimensional datasets, such as gene expression profiles, metabolomics, and patient stratification data. However, searching the optimal number of clusters and other input parameters such as trimmed and sparse represent challenging tasks. Traditional clustering methods may struggle to handle noisy, outliers, redundancy, and high-dimensional data, which are common in biomedical applications, leading to unreliable or biologically uninterpretable results.</div><div>Sparse clustering methods help by emphasizing significant features while suppressing noise, and trimmed clustering can enhance robustness by excluding outliers. Yet, existing approaches often require manual tuning of parameters, such as the trimming proportion, and the sparsity level, which can be time-consuming and based on a trial-and-error approach.</div><div>To address these limitations, this work presents an automated trimmed and sparse clustering method, which automatically determines both the optimal number of clusters and the necessary tuning parameters. Our method has been made available to the biomedical community through the <em>evaluomeR</em> package, which enables researchers to efficiently implement sophisticated clustering without extensive computational background. This advancement not only increases the usability of trimmed and sparse clustering, but also promotes reproducibility and accuracy in data-driven biomedical discoveries.</div></div>","PeriodicalId":10578,"journal":{"name":"Computers in biology and medicine","volume":"194 ","pages":"Article 110436"},"PeriodicalIF":7.0000,"publicationDate":"2025-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers in biology and medicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0010482525007875","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Clustering is an essential tool in biomedical research, often used to identify patterns and subgroups within complex, high-dimensional datasets, such as gene expression profiles, metabolomics, and patient stratification data. However, searching the optimal number of clusters and other input parameters such as trimmed and sparse represent challenging tasks. Traditional clustering methods may struggle to handle noisy, outliers, redundancy, and high-dimensional data, which are common in biomedical applications, leading to unreliable or biologically uninterpretable results.
Sparse clustering methods help by emphasizing significant features while suppressing noise, and trimmed clustering can enhance robustness by excluding outliers. Yet, existing approaches often require manual tuning of parameters, such as the trimming proportion, and the sparsity level, which can be time-consuming and based on a trial-and-error approach.
To address these limitations, this work presents an automated trimmed and sparse clustering method, which automatically determines both the optimal number of clusters and the necessary tuning parameters. Our method has been made available to the biomedical community through the evaluomeR package, which enables researchers to efficiently implement sophisticated clustering without extensive computational background. This advancement not only increases the usability of trimmed and sparse clustering, but also promotes reproducibility and accuracy in data-driven biomedical discoveries.
基于裁剪和稀疏聚类的聚类分析方法优化
聚类是生物医学研究中的重要工具,通常用于识别复杂的高维数据集中的模式和亚组,如基因表达谱、代谢组学和患者分层数据。然而,搜索最优簇数和其他输入参数(如修剪和稀疏)是具有挑战性的任务。传统的聚类方法可能难以处理生物医学应用中常见的噪声、异常值、冗余和高维数据,从而导致不可靠或生物学上不可解释的结果。稀疏聚类方法在强调重要特征的同时抑制噪声,而裁剪聚类方法可以通过排除异常值来增强鲁棒性。然而,现有的方法通常需要手动调整参数,例如修整比例和稀疏度级别,这可能非常耗时,并且基于试错方法。为了解决这些限制,本工作提出了一种自动修剪和稀疏聚类方法,该方法自动确定最优聚类数量和必要的调优参数。我们的方法已经通过evaluomeR包提供给生物医学界,它使研究人员能够在没有广泛计算背景的情况下有效地实现复杂的聚类。这一进步不仅提高了裁剪和稀疏聚类的可用性,而且提高了数据驱动的生物医学发现的可重复性和准确性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Computers in biology and medicine
Computers in biology and medicine 工程技术-工程:生物医学
CiteScore
11.70
自引率
10.40%
发文量
1086
审稿时长
74 days
期刊介绍: Computers in Biology and Medicine is an international forum for sharing groundbreaking advancements in the use of computers in bioscience and medicine. This journal serves as a medium for communicating essential research, instruction, ideas, and information regarding the rapidly evolving field of computer applications in these domains. By encouraging the exchange of knowledge, we aim to facilitate progress and innovation in the utilization of computers in biology and medicine.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信