Bayesian compositional generalized linear mixed models for disease prediction using microbiome data.

IF 2.9 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS
Li Zhang, Xinyan Zhang, Justin M Leach, A K M F Rahman, Carrie R Howell, Nengjun Yi
{"title":"Bayesian compositional generalized linear mixed models for disease prediction using microbiome data.","authors":"Li Zhang, Xinyan Zhang, Justin M Leach, A K M F Rahman, Carrie R Howell, Nengjun Yi","doi":"10.1186/s12859-025-06114-3","DOIUrl":null,"url":null,"abstract":"<p><p>The primary goal of predictive modeling for compositional microbiome data is to better understand and predict disease susceptibility based on the relative abundance of microbial species. Current approaches in this area often assume a high-dimensional sparse setting, where only a small subset of microbiome features is considered relevant to the outcome. However, in real-world data, both large and small effects frequently coexist, and acknowledging the contribution of smaller effects can significantly enhance predictive performance. To address this challenge, we developed Bayesian Compositional Generalized Linear Mixed Models for Analyzing Microbiome Data (BCGLMM). BCGLMM is capable of identifying both moderate taxa effects and the cumulative impact of numerous minor taxa, which are often overlooked in conventional models. With a sparsity-inducing prior, the structured regularized horseshoe prior, BCGLMM effectively collaborates phylogenetically related moderate effects. The random effect term efficiently captures sample-related minor effects by incorporating sample similarities within its variance-covariance matrix. We fitted the proposed models using Markov Chain Monte Carlo (MCMC) algorithms with rstan. The performance of the proposed method was evaluated through extensive simulation studies, demonstrating its superiority with higher prediction accuracy compared to existing methods. We then applied the proposed method on American Gut Data to predict inflammatory bowel disease (IBD). To ensure reproducibility, the code and data used in this paper are available at https://github.com/Li-Zhang28/BCGLMM .</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"98"},"PeriodicalIF":2.9000,"publicationDate":"2025-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11971746/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s12859-025-06114-3","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

The primary goal of predictive modeling for compositional microbiome data is to better understand and predict disease susceptibility based on the relative abundance of microbial species. Current approaches in this area often assume a high-dimensional sparse setting, where only a small subset of microbiome features is considered relevant to the outcome. However, in real-world data, both large and small effects frequently coexist, and acknowledging the contribution of smaller effects can significantly enhance predictive performance. To address this challenge, we developed Bayesian Compositional Generalized Linear Mixed Models for Analyzing Microbiome Data (BCGLMM). BCGLMM is capable of identifying both moderate taxa effects and the cumulative impact of numerous minor taxa, which are often overlooked in conventional models. With a sparsity-inducing prior, the structured regularized horseshoe prior, BCGLMM effectively collaborates phylogenetically related moderate effects. The random effect term efficiently captures sample-related minor effects by incorporating sample similarities within its variance-covariance matrix. We fitted the proposed models using Markov Chain Monte Carlo (MCMC) algorithms with rstan. The performance of the proposed method was evaluated through extensive simulation studies, demonstrating its superiority with higher prediction accuracy compared to existing methods. We then applied the proposed method on American Gut Data to predict inflammatory bowel disease (IBD). To ensure reproducibility, the code and data used in this paper are available at https://github.com/Li-Zhang28/BCGLMM .

利用微生物组数据进行疾病预测的贝叶斯组合广义线性混合模型。
微生物组组成数据预测建模的主要目标是基于微生物物种的相对丰度更好地理解和预测疾病易感性。该领域的当前方法通常假设高维稀疏设置,其中只有一小部分微生物组特征被认为与结果相关。然而,在现实数据中,大小影响经常共存,承认较小影响的贡献可以显著提高预测性能。为了解决这一挑战,我们开发了用于分析微生物组数据的贝叶斯组合广义线性混合模型(BCGLMM)。BCGLMM既能识别中度类群效应,又能识别大量次要类群的累积影响,这在传统模型中经常被忽略。BCGLMM具有稀疏性诱导先验,即结构化正则马蹄先验,可以有效地协同系统发育相关的调节效应。随机效应项通过在其方差-协方差矩阵中纳入样本相似性,有效地捕获样本相关的次要效应。我们使用Markov Chain Monte Carlo (MCMC)算法与rstan进行拟合。通过大量的仿真研究对所提方法的性能进行了评价,证明了与现有方法相比,该方法具有更高的预测精度。然后,我们将该方法应用于美国肠道数据来预测炎症性肠病(IBD)。为了确保再现性,本文中使用的代码和数据可在https://github.com/Li-Zhang28/BCGLMM上获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
BMC Bioinformatics
BMC Bioinformatics 生物-生化研究方法
CiteScore
5.70
自引率
3.30%
发文量
506
审稿时长
4.3 months
期刊介绍: BMC Bioinformatics is an open access, peer-reviewed journal that considers articles on all aspects of the development, testing and novel application of computational and statistical methods for the modeling and analysis of all kinds of biological data, as well as other areas of computational biology. BMC Bioinformatics is part of the BMC series which publishes subject-specific journals focused on the needs of individual research communities across all areas of biology and medicine. We offer an efficient, fair and friendly peer review service, and are committed to publishing all sound science, provided that there is some advance in knowledge presented by the work.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信