离散生物医学数据的变分贝叶斯聚类和变量选择。

IF 2.4 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Bioinformatics advances Pub Date : 2025-03-17 eCollection Date: 2025-01-01 DOI:10.1093/bioadv/vbaf055

Jackie Rao, Paul D W Kirk

{"title":"离散生物医学数据的变分贝叶斯聚类和变量选择。","authors":"Jackie Rao, Paul D W Kirk","doi":"10.1093/bioadv/vbaf055","DOIUrl":null,"url":null,"abstract":"Summary: Effective clustering of biomedical data is crucial in precision medicine, enabling accurate stratification of patients or samples. However, the growth in availability of high-dimensional categorical data, including 'omics data, necessitates computationally efficient clustering algorithms. We present VICatMix, a variational Bayesian finite mixture model designed for the clustering of categorical data. The use of variational inference (VI) in its training allows the model to outperform competitors in terms of computational time and scalability, while maintaining high accuracy. VICatMix furthermore performs variable selection, enhancing its performance on high-dimensional, noisy data. The proposed model incorporates summarization and model averaging to mitigate poor local optima in VI, allowing for improved estimation of the true number of clusters simultaneously with feature saliency. We demonstrate the performance of VICatMix with both simulated and real-world data, including applications to datasets from The Cancer Genome Atlas, showing its use in cancer subtyping and driver gene discovery. We demonstrate VICatMix's potential utility in integrative cluster analysis with different 'omics datasets, enabling the discovery of novel disease subtypes.Availability and implementation: VICatMix is freely available as an R package via CRAN, incorporating C++ for faster computation, at https://CRAN.R-project.org/package=VICatMix.","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf055"},"PeriodicalIF":2.4000,"publicationDate":"2025-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11981716/pdf/","citationCount":"0","resultStr":"{\"title\":\"VICatMix: variational Bayesian clustering and variable selection for discrete biomedical data.\",\"authors\":\"Jackie Rao, Paul D W Kirk\",\"doi\":\"10.1093/bioadv/vbaf055\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Summary: Effective clustering of biomedical data is crucial in precision medicine, enabling accurate stratification of patients or samples. However, the growth in availability of high-dimensional categorical data, including 'omics data, necessitates computationally efficient clustering algorithms. We present VICatMix, a variational Bayesian finite mixture model designed for the clustering of categorical data. The use of variational inference (VI) in its training allows the model to outperform competitors in terms of computational time and scalability, while maintaining high accuracy. VICatMix furthermore performs variable selection, enhancing its performance on high-dimensional, noisy data. The proposed model incorporates summarization and model averaging to mitigate poor local optima in VI, allowing for improved estimation of the true number of clusters simultaneously with feature saliency. We demonstrate the performance of VICatMix with both simulated and real-world data, including applications to datasets from The Cancer Genome Atlas, showing its use in cancer subtyping and driver gene discovery. We demonstrate VICatMix's potential utility in integrative cluster analysis with different 'omics datasets, enabling the discovery of novel disease subtypes.Availability and implementation: VICatMix is freely available as an R package via CRAN, incorporating C++ for faster computation, at https://CRAN.R-project.org/package=VICatMix.\",\"PeriodicalId\":72368,\"journal\":{\"name\":\"Bioinformatics advances\",\"volume\":\"5 1\",\"pages\":\"vbaf055\"},\"PeriodicalIF\":2.4000,\"publicationDate\":\"2025-03-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11981716/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Bioinformatics advances\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1093/bioadv/vbaf055\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"MATHEMATICAL & COMPUTATIONAL BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics advances","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioadv/vbaf055","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

摘要：生物医学数据的有效聚类在精准医学中至关重要，可以对患者或样本进行准确的分层。然而，高维分类数据（包括组学数据）可用性的增长需要计算效率高的聚类算法。我们提出了VICatMix，一个变分贝叶斯有限混合模型，设计用于分类数据的聚类。在训练中使用变分推理（VI）使模型在计算时间和可扩展性方面优于竞争对手，同时保持高精度。此外，VICatMix还进行了变量选择，增强了其在高维噪声数据上的性能。所提出的模型结合了汇总和模型平均，以减轻VI中较差的局部最优，允许在特征显著性的同时改进对簇的真实数量的估计。我们用模拟和真实数据展示了VICatMix的性能，包括对来自癌症基因组图谱的数据集的应用，展示了它在癌症亚型和驱动基因发现中的应用。我们展示了VICatMix在不同组学数据集的综合聚类分析中的潜在效用，从而能够发现新的疾病亚型。可用性和实现：VICatMix是通过CRAN免费提供的R包，它结合了c++以实现更快的计算，网址为https://CRAN.R-project.org/package=VICatMix。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

VICatMix: variational Bayesian clustering and variable selection for discrete biomedical data.

Summary: Effective clustering of biomedical data is crucial in precision medicine, enabling accurate stratification of patients or samples. However, the growth in availability of high-dimensional categorical data, including 'omics data, necessitates computationally efficient clustering algorithms. We present VICatMix, a variational Bayesian finite mixture model designed for the clustering of categorical data. The use of variational inference (VI) in its training allows the model to outperform competitors in terms of computational time and scalability, while maintaining high accuracy. VICatMix furthermore performs variable selection, enhancing its performance on high-dimensional, noisy data. The proposed model incorporates summarization and model averaging to mitigate poor local optima in VI, allowing for improved estimation of the true number of clusters simultaneously with feature saliency. We demonstrate the performance of VICatMix with both simulated and real-world data, including applications to datasets from The Cancer Genome Atlas, showing its use in cancer subtyping and driver gene discovery. We demonstrate VICatMix's potential utility in integrative cluster analysis with different 'omics datasets, enabling the discovery of novel disease subtypes.

Availability and implementation: VICatMix is freely available as an R package via CRAN, incorporating C++ for faster computation, at https://CRAN.R-project.org/package=VICatMix.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Bioinformatics advances

CiteScore

1.60

自引率

0.00%

发文量