Bayesian Variable Shrinkage and Selection in Compositional Data Regression: Application to Oral Microbiome.

Jyotishka Datta, Dipankar Bandyopadhyay
{"title":"Bayesian Variable Shrinkage and Selection in Compositional Data Regression: Application to Oral Microbiome.","authors":"Jyotishka Datta, Dipankar Bandyopadhyay","doi":"10.1007/s41096-024-00194-9","DOIUrl":null,"url":null,"abstract":"<p><p>Microbiome studies generate multivariate compositional responses, such as taxa counts, which are strictly non-negative, bounded, residing within a simplex, and subject to unit-sum constraint. In presence of covariates (which can be moderate to high dimensional), they are popularly modeled via the Dirichlet-Multinomial (D-M) regression framework. In this paper, we consider a Bayesian approach for estimation and inference under a D-M compositional framework, and present a comparative evaluation of some state-of-the-art continuous shrinkage priors for efficient variable selection to identify the most significant associations between available covariates, and taxonomic abundance. Specifically, we compare the performances of the horseshoe and horseshoe+ priors (with the benchmark Bayesian lasso), utilizing Hamiltonian Monte Carlo techniques for posterior sampling, and generating posterior credible intervals. Our simulation studies using synthetic data demonstrate excellent recovery and estimation accuracy of sparse parameter regime by the continuous shrinkage priors. We further illustrate our method via application to a motivating oral microbiome data generated from the NYC-Hanes study. RStan implementation of our method is made available at the GitHub link: (https://github.com/dattahub/compshrink).</p>","PeriodicalId":520248,"journal":{"name":"Journal of the Indian Society for Probability and Statistics","volume":"25 2","pages":"491-515"},"PeriodicalIF":0.0000,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11470902/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the Indian Society for Probability and Statistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s41096-024-00194-9","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/5/29 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Microbiome studies generate multivariate compositional responses, such as taxa counts, which are strictly non-negative, bounded, residing within a simplex, and subject to unit-sum constraint. In presence of covariates (which can be moderate to high dimensional), they are popularly modeled via the Dirichlet-Multinomial (D-M) regression framework. In this paper, we consider a Bayesian approach for estimation and inference under a D-M compositional framework, and present a comparative evaluation of some state-of-the-art continuous shrinkage priors for efficient variable selection to identify the most significant associations between available covariates, and taxonomic abundance. Specifically, we compare the performances of the horseshoe and horseshoe+ priors (with the benchmark Bayesian lasso), utilizing Hamiltonian Monte Carlo techniques for posterior sampling, and generating posterior credible intervals. Our simulation studies using synthetic data demonstrate excellent recovery and estimation accuracy of sparse parameter regime by the continuous shrinkage priors. We further illustrate our method via application to a motivating oral microbiome data generated from the NYC-Hanes study. RStan implementation of our method is made available at the GitHub link: (https://github.com/dattahub/compshrink).

成分数据回归中的贝叶斯变量收缩和选择:应用于口腔微生物组
微生物组研究会产生多变量组成反应,如分类群计数,这些反应是严格非负的、有界的、位于一个简单形内,并受单位和约束。在存在协变量(可以是中维到高维)的情况下,它们通常通过 Dirichlet-Multinomial(D-M)回归框架进行建模。在本文中,我们考虑在 D-M 构成框架下采用贝叶斯方法进行估计和推断,并对一些最先进的连续收缩先验进行比较评估,以有效选择变量,识别可用协变量与分类丰度之间最重要的关联。具体来说,我们比较了马蹄铁和马蹄铁+先验(与基准贝叶斯套索)的性能,利用哈密尔顿蒙特卡洛技术进行后验采样,并生成后验可信区间。我们使用合成数据进行的模拟研究表明,连续收缩先验对稀疏参数机制的恢复和估计精度都非常出色。我们还通过应用 NYC-Hanes 研究中生成的口腔微生物组数据进一步说明了我们的方法。我们方法的 RStan 实现可通过 GitHub 链接获取:(https://github.com/dattahub/compshrink).
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信