Li Zhang, Xinyan Zhang, Justin M Leach, A K M F Rahman, Carrie R Howell, Nengjun Yi
{"title":"利用微生物组数据进行疾病预测的贝叶斯组合广义线性混合模型。","authors":"Li Zhang, Xinyan Zhang, Justin M Leach, A K M F Rahman, Carrie R Howell, Nengjun Yi","doi":"10.1186/s12859-025-06114-3","DOIUrl":null,"url":null,"abstract":"<p><p>The primary goal of predictive modeling for compositional microbiome data is to better understand and predict disease susceptibility based on the relative abundance of microbial species. Current approaches in this area often assume a high-dimensional sparse setting, where only a small subset of microbiome features is considered relevant to the outcome. However, in real-world data, both large and small effects frequently coexist, and acknowledging the contribution of smaller effects can significantly enhance predictive performance. To address this challenge, we developed Bayesian Compositional Generalized Linear Mixed Models for Analyzing Microbiome Data (BCGLMM). BCGLMM is capable of identifying both moderate taxa effects and the cumulative impact of numerous minor taxa, which are often overlooked in conventional models. With a sparsity-inducing prior, the structured regularized horseshoe prior, BCGLMM effectively collaborates phylogenetically related moderate effects. The random effect term efficiently captures sample-related minor effects by incorporating sample similarities within its variance-covariance matrix. We fitted the proposed models using Markov Chain Monte Carlo (MCMC) algorithms with rstan. The performance of the proposed method was evaluated through extensive simulation studies, demonstrating its superiority with higher prediction accuracy compared to existing methods. We then applied the proposed method on American Gut Data to predict inflammatory bowel disease (IBD). To ensure reproducibility, the code and data used in this paper are available at https://github.com/Li-Zhang28/BCGLMM .</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"98"},"PeriodicalIF":2.9000,"publicationDate":"2025-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11971746/pdf/","citationCount":"0","resultStr":"{\"title\":\"Bayesian compositional generalized linear mixed models for disease prediction using microbiome data.\",\"authors\":\"Li Zhang, Xinyan Zhang, Justin M Leach, A K M F Rahman, Carrie R Howell, Nengjun Yi\",\"doi\":\"10.1186/s12859-025-06114-3\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>The primary goal of predictive modeling for compositional microbiome data is to better understand and predict disease susceptibility based on the relative abundance of microbial species. Current approaches in this area often assume a high-dimensional sparse setting, where only a small subset of microbiome features is considered relevant to the outcome. However, in real-world data, both large and small effects frequently coexist, and acknowledging the contribution of smaller effects can significantly enhance predictive performance. To address this challenge, we developed Bayesian Compositional Generalized Linear Mixed Models for Analyzing Microbiome Data (BCGLMM). BCGLMM is capable of identifying both moderate taxa effects and the cumulative impact of numerous minor taxa, which are often overlooked in conventional models. With a sparsity-inducing prior, the structured regularized horseshoe prior, BCGLMM effectively collaborates phylogenetically related moderate effects. The random effect term efficiently captures sample-related minor effects by incorporating sample similarities within its variance-covariance matrix. We fitted the proposed models using Markov Chain Monte Carlo (MCMC) algorithms with rstan. The performance of the proposed method was evaluated through extensive simulation studies, demonstrating its superiority with higher prediction accuracy compared to existing methods. We then applied the proposed method on American Gut Data to predict inflammatory bowel disease (IBD). To ensure reproducibility, the code and data used in this paper are available at https://github.com/Li-Zhang28/BCGLMM .</p>\",\"PeriodicalId\":8958,\"journal\":{\"name\":\"BMC Bioinformatics\",\"volume\":\"26 1\",\"pages\":\"98\"},\"PeriodicalIF\":2.9000,\"publicationDate\":\"2025-04-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11971746/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BMC Bioinformatics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1186/s12859-025-06114-3\",\"RegionNum\":3,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s12859-025-06114-3","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0
摘要
微生物组组成数据预测建模的主要目标是基于微生物物种的相对丰度更好地理解和预测疾病易感性。该领域的当前方法通常假设高维稀疏设置,其中只有一小部分微生物组特征被认为与结果相关。然而,在现实数据中,大小影响经常共存,承认较小影响的贡献可以显著提高预测性能。为了解决这一挑战,我们开发了用于分析微生物组数据的贝叶斯组合广义线性混合模型(BCGLMM)。BCGLMM既能识别中度类群效应,又能识别大量次要类群的累积影响,这在传统模型中经常被忽略。BCGLMM具有稀疏性诱导先验,即结构化正则马蹄先验,可以有效地协同系统发育相关的调节效应。随机效应项通过在其方差-协方差矩阵中纳入样本相似性,有效地捕获样本相关的次要效应。我们使用Markov Chain Monte Carlo (MCMC)算法与rstan进行拟合。通过大量的仿真研究对所提方法的性能进行了评价,证明了与现有方法相比,该方法具有更高的预测精度。然后,我们将该方法应用于美国肠道数据来预测炎症性肠病(IBD)。为了确保再现性,本文中使用的代码和数据可在https://github.com/Li-Zhang28/BCGLMM上获得。
Bayesian compositional generalized linear mixed models for disease prediction using microbiome data.
The primary goal of predictive modeling for compositional microbiome data is to better understand and predict disease susceptibility based on the relative abundance of microbial species. Current approaches in this area often assume a high-dimensional sparse setting, where only a small subset of microbiome features is considered relevant to the outcome. However, in real-world data, both large and small effects frequently coexist, and acknowledging the contribution of smaller effects can significantly enhance predictive performance. To address this challenge, we developed Bayesian Compositional Generalized Linear Mixed Models for Analyzing Microbiome Data (BCGLMM). BCGLMM is capable of identifying both moderate taxa effects and the cumulative impact of numerous minor taxa, which are often overlooked in conventional models. With a sparsity-inducing prior, the structured regularized horseshoe prior, BCGLMM effectively collaborates phylogenetically related moderate effects. The random effect term efficiently captures sample-related minor effects by incorporating sample similarities within its variance-covariance matrix. We fitted the proposed models using Markov Chain Monte Carlo (MCMC) algorithms with rstan. The performance of the proposed method was evaluated through extensive simulation studies, demonstrating its superiority with higher prediction accuracy compared to existing methods. We then applied the proposed method on American Gut Data to predict inflammatory bowel disease (IBD). To ensure reproducibility, the code and data used in this paper are available at https://github.com/Li-Zhang28/BCGLMM .
期刊介绍:
BMC Bioinformatics is an open access, peer-reviewed journal that considers articles on all aspects of the development, testing and novel application of computational and statistical methods for the modeling and analysis of all kinds of biological data, as well as other areas of computational biology.
BMC Bioinformatics is part of the BMC series which publishes subject-specific journals focused on the needs of individual research communities across all areas of biology and medicine. We offer an efficient, fair and friendly peer review service, and are committed to publishing all sound science, provided that there is some advance in knowledge presented by the work.