A framework for predictive modeling of microbiome multi-omics data: latent interacting variable-effects (LIVE) modeling.

IF 3.3 3区生物学 Q2 BIOCHEMICAL RESEARCH METHODS

BMC Bioinformatics Pub Date : 2025-04-29 DOI:10.1186/s12859-025-06134-z

Javier Munoz Briones, Douglas K Brubaker

{"title":"A framework for predictive modeling of microbiome multi-omics data: latent interacting variable-effects (LIVE) modeling.","authors":"Javier Munoz Briones, Douglas K Brubaker","doi":"10.1186/s12859-025-06134-z","DOIUrl":null,"url":null,"abstract":"Background: The number and size of multi-omics datasets with paired measurements of the host and microbiome is rapidly increasing with the advance of sequencing technologies. As it becomes routine to generate these datasets, computational methods to aid in their interpretation become increasingly important. Here, we present a framework for integration of microbiome multi-omics data: Latent Interacting Variable Effects (LIVE) modeling. LIVE integrates multi-omics data using single-omic latent variables (LV) organized in a structured meta-model to determine the combinations of features most predictive of a phenotype or condition.Results: We developed a supervised version of LIVE leveraging sparse Partial Least Squares Discriminant Analysis (sPLS-DA) LVs, and an unsupervised version leveraging sparse Principal Component Analysis (sPCA) principal components which both can incorporate covariate awarness. LIVE performance was tested on publicly available metagenomic and metabolomics data set from Crohn's Disease (CD) and Ulcerative Colitis (UC) status patients in the PRISM and LLDeep cohorts, and benchmarked against existing gut microbiome multi-omics approaches and vaginal microbiome datasests, achieving consistent and comparable performances. In addition to these benchmarking efforts, we present a detailed analysis and interpretation of both versions of LIVE using the PRISM and LLDeep cohorts. LIVE reduced the number of feature interactions from the original datasets for CD and UC from millions to less than 20,000 while conditioning the disease-predictive power of gut microbes, metabolites, enzymes, on clinical variables.Conclusions: LIVE makes a distinct, complementary contribution to current methods to integrate microbiome data and offers key advantages to existing approaches in the interpretable integration of multi-omics data with clinical variables to predict to disease outcomes and identify microbiome mechanisms of disease.","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"115"},"PeriodicalIF":3.3000,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12042529/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s12859-025-06134-z","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Background: The number and size of multi-omics datasets with paired measurements of the host and microbiome is rapidly increasing with the advance of sequencing technologies. As it becomes routine to generate these datasets, computational methods to aid in their interpretation become increasingly important. Here, we present a framework for integration of microbiome multi-omics data: Latent Interacting Variable Effects (LIVE) modeling. LIVE integrates multi-omics data using single-omic latent variables (LV) organized in a structured meta-model to determine the combinations of features most predictive of a phenotype or condition.

Results: We developed a supervised version of LIVE leveraging sparse Partial Least Squares Discriminant Analysis (sPLS-DA) LVs, and an unsupervised version leveraging sparse Principal Component Analysis (sPCA) principal components which both can incorporate covariate awarness. LIVE performance was tested on publicly available metagenomic and metabolomics data set from Crohn's Disease (CD) and Ulcerative Colitis (UC) status patients in the PRISM and LLDeep cohorts, and benchmarked against existing gut microbiome multi-omics approaches and vaginal microbiome datasests, achieving consistent and comparable performances. In addition to these benchmarking efforts, we present a detailed analysis and interpretation of both versions of LIVE using the PRISM and LLDeep cohorts. LIVE reduced the number of feature interactions from the original datasets for CD and UC from millions to less than 20,000 while conditioning the disease-predictive power of gut microbes, metabolites, enzymes, on clinical variables.

Conclusions: LIVE makes a distinct, complementary contribution to current methods to integrate microbiome data and offers key advantages to existing approaches in the interpretable integration of multi-omics data with clinical variables to predict to disease outcomes and identify microbiome mechanisms of disease.

查看原文本刊更多论文

微生物组多组学数据的预测建模框架：潜在相互作用变量效应（LIVE）建模。

背景：随着测序技术的进步，宿主和微生物组成对测量的多组学数据集的数量和规模正在迅速增加。随着生成这些数据集成为常规，帮助解释这些数据集的计算方法变得越来越重要。在这里，我们提出了一个整合微生物组多组学数据的框架：潜在相互作用变量效应（LIVE）模型。LIVE使用结构化元模型中组织的单组学潜在变量（LV）集成多组学数据，以确定最能预测表型或病症的特征组合。结果：我们开发了一个利用稀疏偏最小二乘判别分析（sPLS-DA） lv的监督版本LIVE，以及一个利用稀疏主成分分析（sPCA）主成分的无监督版本LIVE，两者都可以纳入协变量感知。LIVE性能在PRISM和LLDeep队列中克罗恩病（CD）和溃疡性结肠炎（UC）状态患者的公开宏基因组学和代谢组学数据集上进行了测试，并与现有的肠道微生物组多组学方法和阴道微生物组学数据进行了基准测试，获得了一致和可比的性能。除了这些基准测试工作之外，我们还使用PRISM和LLDeep队列对两个版本的LIVE进行了详细的分析和解释。LIVE将CD和UC的原始数据集的特征相互作用数量从数百万减少到不到20,000，同时调节肠道微生物，代谢物，酶对临床变量的疾病预测能力。结论：LIVE对当前整合微生物组数据的方法做出了独特的补充贡献，并在多组学数据与临床变量的可解释整合方面为现有方法提供了关键优势，以预测疾病结局并确定疾病的微生物组机制。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

BMC Bioinformatics 生物-生化研究方法

CiteScore

5.70

自引率

3.30%

发文量

506

审稿时长

4.3 months

期刊介绍： BMC Bioinformatics is an open access, peer-reviewed journal that considers articles on all aspects of the development, testing and novel application of computational and statistical methods for the modeling and analysis of all kinds of biological data, as well as other areas of computational biology. BMC Bioinformatics is part of the BMC series which publishes subject-specific journals focused on the needs of individual research communities across all areas of biology and medicine. We offer an efficient, fair and friendly peer review service, and are committed to publishing all sound science, provided that there is some advance in knowledge presented by the work.