基于EM算法分析组成和亚组成微生物组数据的贝叶斯广义线性模型。

IF 1.8 4区医学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Statistics in Medicine Pub Date : 2025-03-30 DOI:10.1002/sim.70084

Li Zhang, Zhenying Ding, Jinhong Cui, Xiaoxiao Zhou, Nengjun Yi

{"title":"基于EM算法分析组成和亚组成微生物组数据的贝叶斯广义线性模型。","authors":"Li Zhang, Zhenying Ding, Jinhong Cui, Xiaoxiao Zhou, Nengjun Yi","doi":"10.1002/sim.70084","DOIUrl":null,"url":null,"abstract":"The study of compositional microbiome data is critical for exploring the functional roles of microbial communities in human health and disease. Recent advances have shifted from traditional log-ratio transformations of compositional covariates to zero constraint on the sum of the corresponding coefficients. Various approaches, including penalized regression and Markov Chain Monte Carlo (MCMC) algorithms, have been extended to enforce this sum-to-zero constraint. However, these methods exhibit limitations: penalized regression yields only point estimates, limiting uncertainty assessment, while MCMC methods, although reliable, are computationally intensive, particularly in high-dimensional data settings. To address the challenges posed by existing methods, we proposed Bayesian generalized linear models for analyzing compositional and sub-compositional microbiome data. Our model employs a spike-and-slab double-exponential prior on the microbiome coefficients, inducing weak shrinkage on large coefficients and strong shrinkage on irrelevant ones, making it ideal for high-dimensional microbiome data. The sum-to-zero constraint is handled through soft-centers by applying prior distribution on the sum of compositional or subcompositional coefficients. To alleviate computational intensity, we have developed a fast and stable algorithm incorporating expectation-maximization (EM) steps into the routine iteratively weighted least squares (IWLS) algorithm for fitting GLMs. The performance of the proposed method was assessed by extensive simulation studies. The simulation results show that our approach outperforms existing methods with higher accuracy of coefficient estimates and lower prediction error. We also applied the proposed method to one microbiome study to find microorganisms linked to inflammatory bowel disease (IBD). The methods have been implemented in a freely available R package BhGLM https://github.com/nyiuab/BhGLM.","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"44 7","pages":"e70084"},"PeriodicalIF":1.8000,"publicationDate":"2025-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Bayesian Generalized Linear Models for Analyzing Compositional and Sub-Compositional Microbiome Data via EM Algorithm.\",\"authors\":\"Li Zhang, Zhenying Ding, Jinhong Cui, Xiaoxiao Zhou, Nengjun Yi\",\"doi\":\"10.1002/sim.70084\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The study of compositional microbiome data is critical for exploring the functional roles of microbial communities in human health and disease. Recent advances have shifted from traditional log-ratio transformations of compositional covariates to zero constraint on the sum of the corresponding coefficients. Various approaches, including penalized regression and Markov Chain Monte Carlo (MCMC) algorithms, have been extended to enforce this sum-to-zero constraint. However, these methods exhibit limitations: penalized regression yields only point estimates, limiting uncertainty assessment, while MCMC methods, although reliable, are computationally intensive, particularly in high-dimensional data settings. To address the challenges posed by existing methods, we proposed Bayesian generalized linear models for analyzing compositional and sub-compositional microbiome data. Our model employs a spike-and-slab double-exponential prior on the microbiome coefficients, inducing weak shrinkage on large coefficients and strong shrinkage on irrelevant ones, making it ideal for high-dimensional microbiome data. The sum-to-zero constraint is handled through soft-centers by applying prior distribution on the sum of compositional or subcompositional coefficients. To alleviate computational intensity, we have developed a fast and stable algorithm incorporating expectation-maximization (EM) steps into the routine iteratively weighted least squares (IWLS) algorithm for fitting GLMs. The performance of the proposed method was assessed by extensive simulation studies. The simulation results show that our approach outperforms existing methods with higher accuracy of coefficient estimates and lower prediction error. We also applied the proposed method to one microbiome study to find microorganisms linked to inflammatory bowel disease (IBD). The methods have been implemented in a freely available R package BhGLM https://github.com/nyiuab/BhGLM.\",\"PeriodicalId\":21879,\"journal\":{\"name\":\"Statistics in Medicine\",\"volume\":\"44 7\",\"pages\":\"e70084\"},\"PeriodicalIF\":1.8000,\"publicationDate\":\"2025-03-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Statistics in Medicine\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1002/sim.70084\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"MATHEMATICAL & COMPUTATIONAL BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistics in Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1002/sim.70084","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

微生物组组成数据的研究对于探索微生物群落在人类健康和疾病中的功能作用至关重要。最近的进展已经从传统的组合协变量的对数比变换转向对相应系数和的零约束。各种方法，包括惩罚回归和马尔可夫链蒙特卡罗（MCMC）算法，已经扩展到强制执行这个和到零的约束。然而，这些方法显示出局限性：惩罚回归只产生点估计，限制了不确定性评估，而MCMC方法虽然可靠，但计算量很大，特别是在高维数据设置中。为了解决现有方法带来的挑战，我们提出了贝叶斯广义线性模型来分析组成和亚组成微生物组数据。我们的模型在微生物组系数上采用尖峰-板双指数先验，在大系数上诱导弱收缩，在无关系数上诱导强收缩，使其成为高维微生物组数据的理想选择。和到零约束通过软中心处理，通过对组合或子组合系数的和应用先验分布。为了减轻计算强度，我们开发了一种快速稳定的算法，将期望最大化（EM）步骤纳入常规的迭代加权最小二乘（IWLS）算法中，用于拟合glm。通过大量的仿真研究对所提方法的性能进行了评估。仿真结果表明，该方法具有系数估计精度高、预测误差小的优点。我们还将提出的方法应用于一项微生物组研究，以发现与炎症性肠病（IBD）相关的微生物。这些方法已经在一个免费的R包BhGLM https://github.com/nyiuab/BhGLM中实现。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Bayesian Generalized Linear Models for Analyzing Compositional and Sub-Compositional Microbiome Data via EM Algorithm.

The study of compositional microbiome data is critical for exploring the functional roles of microbial communities in human health and disease. Recent advances have shifted from traditional log-ratio transformations of compositional covariates to zero constraint on the sum of the corresponding coefficients. Various approaches, including penalized regression and Markov Chain Monte Carlo (MCMC) algorithms, have been extended to enforce this sum-to-zero constraint. However, these methods exhibit limitations: penalized regression yields only point estimates, limiting uncertainty assessment, while MCMC methods, although reliable, are computationally intensive, particularly in high-dimensional data settings. To address the challenges posed by existing methods, we proposed Bayesian generalized linear models for analyzing compositional and sub-compositional microbiome data. Our model employs a spike-and-slab double-exponential prior on the microbiome coefficients, inducing weak shrinkage on large coefficients and strong shrinkage on irrelevant ones, making it ideal for high-dimensional microbiome data. The sum-to-zero constraint is handled through soft-centers by applying prior distribution on the sum of compositional or subcompositional coefficients. To alleviate computational intensity, we have developed a fast and stable algorithm incorporating expectation-maximization (EM) steps into the routine iteratively weighted least squares (IWLS) algorithm for fitting GLMs. The performance of the proposed method was assessed by extensive simulation studies. The simulation results show that our approach outperforms existing methods with higher accuracy of coefficient estimates and lower prediction error. We also applied the proposed method to one microbiome study to find microorganisms linked to inflammatory bowel disease (IBD). The methods have been implemented in a freely available R package BhGLM https://github.com/nyiuab/BhGLM.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Statistics in Medicine 医学-公共卫生、环境卫生与职业卫生

CiteScore

3.40

自引率

10.00%

发文量

334

审稿时长

2-4 weeks

期刊介绍： The journal aims to influence practice in medicine and its associated sciences through the publication of papers on statistical and other quantitative methods. Papers will explain new methods and demonstrate their application, preferably through a substantive, real, motivating example or a comprehensive evaluation based on an illustrative example. Alternatively, papers will report on case-studies where creative use or technical generalizations of established methodology is directed towards a substantive application. Reviews of, and tutorials on, general topics relevant to the application of statistics to medicine will also be published. The main criteria for publication are appropriateness of the statistical methods to a particular medical problem and clarity of exposition. Papers with primarily mathematical content will be excluded. The journal aims to enhance communication between statisticians, clinicians and medical researchers.