儿童队列中身体指数与粪便微生物组的关系与种族-地理因素的交互作用:使用贝叶斯零膨胀负二项回归模型进行精确分析。

IF 5 2区 生物学 Q1 MICROBIOLOGY
mSystems Pub Date : 2024-11-21 DOI:10.1128/msystems.01345-24
Jian Huang, Yanzhuan Lu, Fengwei Tian, Yongqing Ni
{"title":"儿童队列中身体指数与粪便微生物组的关系与种族-地理因素的交互作用:使用贝叶斯零膨胀负二项回归模型进行精确分析。","authors":"Jian Huang, Yanzhuan Lu, Fengwei Tian, Yongqing Ni","doi":"10.1128/msystems.01345-24","DOIUrl":null,"url":null,"abstract":"<p><p>The exponential growth of high-throughput sequencing (HTS) data on the microbial communities presents researchers with an unparalleled opportunity to delve deeper into the association of microorganisms with host phenotype. However, this growth also poses a challenge, as microbial data are complex, sparse, discrete, and prone to zero inflation. Herein, by utilizing 10 distinct counting models for analyzing simulated data, we proposed an innovative Bayesian zero-inflated negative binomial (ZINB) regression model that is capable of identifying differentially abundant taxa associated with distinctive host phenotypes and quantifying the effects of covariates on these taxa. Our proposed model exhibits excellent accuracy compared with conventional Hurdle and INLA models, especially in scenarios characterized by inflation and overdispersion. Moreover, we confirm that dispersion parameters significantly affect the accuracy of model results, with defects gradually alleviating as the number of analyzed samples increases. Subsequently applying our model to amplicon data in real multi-ethnic children cohort, we found that only a subset of taxa were identified as having zero inflation in real data, suggesting that the prevailing understanding and processing of microbial count data in most previous microbiome studies were overly dogmatic. In practice, our pipeline of integrating bacterial differential abundance in microbiome data and relevant covariates is effective and feasible. Taken together, our method is expected to be extended to the microbiota studies of various multi-cohort populations.</p><p><strong>Importance: </strong>The microbiome is closely associated with physical indicators of the body, such as height, weight, age and BMI, which can be used as measures of human health. Accurately identifying which taxa in the microbiome are closely related to indicators of physical development is valuable as microbial markers of regional child growth trajectory. Zero-inflated negative binomial (ZINB) model, a type of Bayesian generalized linear model, can be effectively modeled in complex biological systems. We present an innovative ZINB regression model that is capable of identifying differentially abundant taxa associated with distinctive host phenotypes and quantifying the effects of covariates on these taxa, and demonstrate that its accuracy is superior to traditional Hurdle and INLA models. Our pipeline of integrating bacterial differential abundance in microbiome data and relevant covariates is effective and feasible.</p>","PeriodicalId":18819,"journal":{"name":"mSystems","volume":" ","pages":"e0134524"},"PeriodicalIF":5.0000,"publicationDate":"2024-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Association of body index with fecal microbiome in children cohorts with ethnic-geographic factor interaction: accurately using a Bayesian zero-inflated negative binomial regression model.\",\"authors\":\"Jian Huang, Yanzhuan Lu, Fengwei Tian, Yongqing Ni\",\"doi\":\"10.1128/msystems.01345-24\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>The exponential growth of high-throughput sequencing (HTS) data on the microbial communities presents researchers with an unparalleled opportunity to delve deeper into the association of microorganisms with host phenotype. However, this growth also poses a challenge, as microbial data are complex, sparse, discrete, and prone to zero inflation. Herein, by utilizing 10 distinct counting models for analyzing simulated data, we proposed an innovative Bayesian zero-inflated negative binomial (ZINB) regression model that is capable of identifying differentially abundant taxa associated with distinctive host phenotypes and quantifying the effects of covariates on these taxa. Our proposed model exhibits excellent accuracy compared with conventional Hurdle and INLA models, especially in scenarios characterized by inflation and overdispersion. Moreover, we confirm that dispersion parameters significantly affect the accuracy of model results, with defects gradually alleviating as the number of analyzed samples increases. Subsequently applying our model to amplicon data in real multi-ethnic children cohort, we found that only a subset of taxa were identified as having zero inflation in real data, suggesting that the prevailing understanding and processing of microbial count data in most previous microbiome studies were overly dogmatic. In practice, our pipeline of integrating bacterial differential abundance in microbiome data and relevant covariates is effective and feasible. Taken together, our method is expected to be extended to the microbiota studies of various multi-cohort populations.</p><p><strong>Importance: </strong>The microbiome is closely associated with physical indicators of the body, such as height, weight, age and BMI, which can be used as measures of human health. Accurately identifying which taxa in the microbiome are closely related to indicators of physical development is valuable as microbial markers of regional child growth trajectory. Zero-inflated negative binomial (ZINB) model, a type of Bayesian generalized linear model, can be effectively modeled in complex biological systems. We present an innovative ZINB regression model that is capable of identifying differentially abundant taxa associated with distinctive host phenotypes and quantifying the effects of covariates on these taxa, and demonstrate that its accuracy is superior to traditional Hurdle and INLA models. Our pipeline of integrating bacterial differential abundance in microbiome data and relevant covariates is effective and feasible.</p>\",\"PeriodicalId\":18819,\"journal\":{\"name\":\"mSystems\",\"volume\":\" \",\"pages\":\"e0134524\"},\"PeriodicalIF\":5.0000,\"publicationDate\":\"2024-11-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"mSystems\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1128/msystems.01345-24\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MICROBIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"mSystems","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1128/msystems.01345-24","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MICROBIOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

微生物群落的高通量测序(HTS)数据呈指数级增长,为研究人员深入研究微生物与宿主表型的关联提供了无与伦比的机会。然而,这种增长也带来了挑战,因为微生物数据复杂、稀少、离散,而且容易出现零膨胀。在此,我们利用 10 个不同的计数模型来分析模拟数据,提出了一个创新的贝叶斯零膨胀负二项(ZINB)回归模型,该模型能够识别与独特宿主表型相关的不同丰富类群,并量化协变量对这些类群的影响。与传统的 Hurdle 和 INLA 模型相比,我们提出的模型具有极高的准确性,尤其是在膨胀和过度分散的情况下。此外,我们还证实,分散参数会显著影响模型结果的准确性,随着分析样本数量的增加,缺陷会逐渐减少。随后,我们将模型应用于真实多种族儿童队列中的扩增片段数据,发现在真实数据中只有一部分类群被认定为零膨胀,这表明之前大多数微生物组研究对微生物计数数据的理解和处理过于教条。在实践中,我们整合微生物组数据中细菌差异丰度和相关协变量的方法是有效和可行的。综上所述,我们的方法有望推广到各种多队列人群的微生物群研究中:微生物群与身高、体重、年龄和体重指数等身体指标密切相关,可作为人体健康的衡量标准。准确确定微生物组中哪些类群与身体发育指标密切相关,作为区域性儿童生长轨迹的微生物标记非常有价值。零膨胀负二项(ZINB)模型是贝叶斯广义线性模型的一种,可以有效地模拟复杂的生物系统。我们提出了一种创新的 ZINB 回归模型,该模型能够识别与独特宿主表型相关的差异丰度类群,并量化协变量对这些类群的影响,同时证明其准确性优于传统的 Hurdle 模型和 INLA 模型。我们在微生物组数据中整合细菌差异丰度和相关协变量的方法是有效和可行的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Association of body index with fecal microbiome in children cohorts with ethnic-geographic factor interaction: accurately using a Bayesian zero-inflated negative binomial regression model.

The exponential growth of high-throughput sequencing (HTS) data on the microbial communities presents researchers with an unparalleled opportunity to delve deeper into the association of microorganisms with host phenotype. However, this growth also poses a challenge, as microbial data are complex, sparse, discrete, and prone to zero inflation. Herein, by utilizing 10 distinct counting models for analyzing simulated data, we proposed an innovative Bayesian zero-inflated negative binomial (ZINB) regression model that is capable of identifying differentially abundant taxa associated with distinctive host phenotypes and quantifying the effects of covariates on these taxa. Our proposed model exhibits excellent accuracy compared with conventional Hurdle and INLA models, especially in scenarios characterized by inflation and overdispersion. Moreover, we confirm that dispersion parameters significantly affect the accuracy of model results, with defects gradually alleviating as the number of analyzed samples increases. Subsequently applying our model to amplicon data in real multi-ethnic children cohort, we found that only a subset of taxa were identified as having zero inflation in real data, suggesting that the prevailing understanding and processing of microbial count data in most previous microbiome studies were overly dogmatic. In practice, our pipeline of integrating bacterial differential abundance in microbiome data and relevant covariates is effective and feasible. Taken together, our method is expected to be extended to the microbiota studies of various multi-cohort populations.

Importance: The microbiome is closely associated with physical indicators of the body, such as height, weight, age and BMI, which can be used as measures of human health. Accurately identifying which taxa in the microbiome are closely related to indicators of physical development is valuable as microbial markers of regional child growth trajectory. Zero-inflated negative binomial (ZINB) model, a type of Bayesian generalized linear model, can be effectively modeled in complex biological systems. We present an innovative ZINB regression model that is capable of identifying differentially abundant taxa associated with distinctive host phenotypes and quantifying the effects of covariates on these taxa, and demonstrate that its accuracy is superior to traditional Hurdle and INLA models. Our pipeline of integrating bacterial differential abundance in microbiome data and relevant covariates is effective and feasible.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
mSystems
mSystems Biochemistry, Genetics and Molecular Biology-Biochemistry
CiteScore
10.50
自引率
3.10%
发文量
308
审稿时长
13 weeks
期刊介绍: mSystems™ will publish preeminent work that stems from applying technologies for high-throughput analyses to achieve insights into the metabolic and regulatory systems at the scale of both the single cell and microbial communities. The scope of mSystems™ encompasses all important biological and biochemical findings drawn from analyses of large data sets, as well as new computational approaches for deriving these insights. mSystems™ will welcome submissions from researchers who focus on the microbiome, genomics, metagenomics, transcriptomics, metabolomics, proteomics, glycomics, bioinformatics, and computational microbiology. mSystems™ will provide streamlined decisions, while carrying on ASM''s tradition of rigorous peer review.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信