I-SVVS: integrative stochastic variational variable selection to explore joint patterns of multi-omics microbiome data.

IF 7.7 2区生物学 Q1 BIOCHEMICAL RESEARCH METHODS

Briefings in bioinformatics Pub Date : 2025-05-01 DOI:10.1093/bib/bbaf132

Tung Dang, Yushiro Fuji, Kie Kumaishi, Erika Usui, Shungo Kobori, Takumi Sato, Megumi Narukawa, Yusuke Toda, Kengo Sakurai, Yuji Yamasaki, Hisashi Tsujimoto, Masami Yokota Hirai, Yasunori Ichihashi, Hiroyoshi Iwata

{"title":"I-SVVS: integrative stochastic variational variable selection to explore joint patterns of multi-omics microbiome data.","authors":"Tung Dang, Yushiro Fuji, Kie Kumaishi, Erika Usui, Shungo Kobori, Takumi Sato, Megumi Narukawa, Yusuke Toda, Kengo Sakurai, Yuji Yamasaki, Hisashi Tsujimoto, Masami Yokota Hirai, Yasunori Ichihashi, Hiroyoshi Iwata","doi":"10.1093/bib/bbaf132","DOIUrl":null,"url":null,"abstract":"<p><p>High-dimensional multi-omics microbiome data play an important role in elucidating microbial community interactions with their hosts and environment in critical diseases and ecological changes. Although Bayesian clustering methods have recently been used for the integrated analysis of multi-omics data, no method designed to analyze multi-omics microbiome data has been proposed. In this study, we propose a novel framework called integrative stochastic variational variable selection (I-SVVS), which is an extension of stochastic variational variable selection for high-dimensional microbiome data. The I-SVVS approach addresses a specific Bayesian mixture model for each type of omics data, such as an infinite Dirichlet multinomial mixture model for microbiome data and an infinite Gaussian mixture model for metabolomic data. This approach is expected to reduce the computational time of the clustering process and improve the accuracy of the clustering results. Additionally, I-SVVS identifies a critical set of representative variables in multi-omics microbiome data. Three datasets from soybean, mice, and humans (each set integrated microbiome and metabolome) were used to demonstrate the potential of I-SVVS. The results indicate that I-SVVS achieved improved accuracy and faster computation compared to existing methods across all test datasets. It effectively identified key microbiome species and metabolites characterizing each cluster. For instance, the computational analysis of the soybean dataset, including 377 samples with 16 943 microbiome species and 265 metabolome features, was completed in 2.18 hours using I-SVVS, compared to 2.35 days with Clusternomics and 1.12 days with iClusterPlus. The software for this analysis, written in Python, is freely available at https://github.com/tungtokyo1108/I-SVVS.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 3","pages":""},"PeriodicalIF":7.7000,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12122083/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Briefings in bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/bib/bbaf132","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

High-dimensional multi-omics microbiome data play an important role in elucidating microbial community interactions with their hosts and environment in critical diseases and ecological changes. Although Bayesian clustering methods have recently been used for the integrated analysis of multi-omics data, no method designed to analyze multi-omics microbiome data has been proposed. In this study, we propose a novel framework called integrative stochastic variational variable selection (I-SVVS), which is an extension of stochastic variational variable selection for high-dimensional microbiome data. The I-SVVS approach addresses a specific Bayesian mixture model for each type of omics data, such as an infinite Dirichlet multinomial mixture model for microbiome data and an infinite Gaussian mixture model for metabolomic data. This approach is expected to reduce the computational time of the clustering process and improve the accuracy of the clustering results. Additionally, I-SVVS identifies a critical set of representative variables in multi-omics microbiome data. Three datasets from soybean, mice, and humans (each set integrated microbiome and metabolome) were used to demonstrate the potential of I-SVVS. The results indicate that I-SVVS achieved improved accuracy and faster computation compared to existing methods across all test datasets. It effectively identified key microbiome species and metabolites characterizing each cluster. For instance, the computational analysis of the soybean dataset, including 377 samples with 16 943 microbiome species and 265 metabolome features, was completed in 2.18 hours using I-SVVS, compared to 2.35 days with Clusternomics and 1.12 days with iClusterPlus. The software for this analysis, written in Python, is freely available at https://github.com/tungtokyo1108/I-SVVS.

查看原文本刊更多论文

I-SVVS：综合随机变分变量选择，探索多组学微生物组数据的联合模式。

高维多组学微生物组数据在阐明重大疾病和生态变化中微生物群落与宿主和环境的相互作用方面发挥着重要作用。虽然贝叶斯聚类方法最近已被用于多组学数据的综合分析，但尚未提出设计用于分析多组学微生物组数据的方法。在这项研究中，我们提出了一个新的框架，称为综合随机变分变量选择（I-SVVS），这是高维微生物组数据随机变分变量选择的扩展。I-SVVS方法针对每种类型的组学数据解决特定的贝叶斯混合模型，例如微生物组数据的无限Dirichlet多项混合模型和代谢组数据的无限高斯混合模型。该方法有望减少聚类过程的计算时间，提高聚类结果的准确性。此外，I-SVVS在多组学微生物组数据中确定了一组关键的代表性变量。来自大豆、小鼠和人类的三个数据集（每个数据集都整合了微生物组和代谢组）被用来证明I-SVVS的潜力。结果表明，在所有测试数据集上，与现有方法相比，I-SVVS获得了更高的精度和更快的计算速度。它有效地识别了每个集群的关键微生物组物种和代谢物。例如，对大豆数据集的计算分析，包括377个样本，16943种微生物组和265个代谢组特征，使用I-SVVS在2.18小时内完成，而使用Clusternomics和iClusterPlus分别需要2.35天和1.12天。用于此分析的软件是用Python编写的，可以在https://github.com/tungtokyo1108/I-SVVS上免费获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Briefings in bioinformatics 生物-生化研究方法

CiteScore

13.20

自引率

13.70%

发文量

549

审稿时长

6 months

期刊介绍： Briefings in Bioinformatics is an international journal serving as a platform for researchers and educators in the life sciences. It also appeals to mathematicians, statisticians, and computer scientists applying their expertise to biological challenges. The journal focuses on reviews tailored for users of databases and analytical tools in contemporary genetics, molecular and systems biology. It stands out by offering practical assistance and guidance to non-specialists in computerized methodologies. Covering a wide range from introductory concepts to specific protocols and analyses, the papers address bacterial, plant, fungal, animal, and human data. The journal's detailed subject areas include genetic studies of phenotypes and genotypes, mapping, DNA sequencing, expression profiling, gene expression studies, microarrays, alignment methods, protein profiles and HMMs, lipids, metabolic and signaling pathways, structure determination and function prediction, phylogenetic studies, and education and training.