Quantifying uncertainty in microbiome-based prediction using Gaussian processes with microbial community dissimilarities.

IF 2.4 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY
Bioinformatics advances Pub Date : 2025-03-11 eCollection Date: 2025-01-01 DOI:10.1093/bioadv/vbaf045
Asahi Adachi, Fan Zhang, Shigehiko Kanaya, Naoaki Ono
{"title":"Quantifying uncertainty in microbiome-based prediction using Gaussian processes with microbial community dissimilarities.","authors":"Asahi Adachi, Fan Zhang, Shigehiko Kanaya, Naoaki Ono","doi":"10.1093/bioadv/vbaf045","DOIUrl":null,"url":null,"abstract":"<p><strong>Summary: </strong>The human microbiome is closely associated with the health and disease of the human host. Machine learning models have recently utilized the human microbiome to predict health conditions and disease status. Quantifying predictive uncertainty is essential for the reliable application of these microbiome-based prediction models in clinical settings. However, uncertainty quantification in such prediction models remains unexplored. In this study, we have developed a probabilistic prediction model using a Gaussian process (GP) with a kernel function that incorporates microbial community dissimilarities. We evaluated the performance of probabilistic prediction across three regression tasks: chronological age, body mass index, and disease severity, using publicly available human gut microbiome datasets. The results demonstrated that our model outperformed existing methods in terms of probabilistic prediction accuracy. Furthermore, we found that the confidence levels closely matched the empirical coverage and that data points predicted with lower uncertainty corresponded to lower prediction errors. These findings suggest that GP regression models incorporating community dissimilarities effectively capture the characteristics of phylogenetic, high-dimensional, and sparse microbial abundance data. Our study provides a more reliable framework for microbiome-based prediction, potentially advancing the application of microbiome data in health monitoring and disease diagnosis in clinical settings.</p><p><strong>Availability and implementation: </strong>The code is available at https://github.com/asahiadachi/gp4microbiome.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf045"},"PeriodicalIF":2.4000,"publicationDate":"2025-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11919817/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics advances","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioadv/vbaf045","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Summary: The human microbiome is closely associated with the health and disease of the human host. Machine learning models have recently utilized the human microbiome to predict health conditions and disease status. Quantifying predictive uncertainty is essential for the reliable application of these microbiome-based prediction models in clinical settings. However, uncertainty quantification in such prediction models remains unexplored. In this study, we have developed a probabilistic prediction model using a Gaussian process (GP) with a kernel function that incorporates microbial community dissimilarities. We evaluated the performance of probabilistic prediction across three regression tasks: chronological age, body mass index, and disease severity, using publicly available human gut microbiome datasets. The results demonstrated that our model outperformed existing methods in terms of probabilistic prediction accuracy. Furthermore, we found that the confidence levels closely matched the empirical coverage and that data points predicted with lower uncertainty corresponded to lower prediction errors. These findings suggest that GP regression models incorporating community dissimilarities effectively capture the characteristics of phylogenetic, high-dimensional, and sparse microbial abundance data. Our study provides a more reliable framework for microbiome-based prediction, potentially advancing the application of microbiome data in health monitoring and disease diagnosis in clinical settings.

Availability and implementation: The code is available at https://github.com/asahiadachi/gp4microbiome.

利用微生物群落差异的高斯过程量化微生物组预测中的不确定性。
摘要:人类微生物群与人类宿主的健康和疾病密切相关。机器学习模型最近利用人类微生物组来预测健康状况和疾病状态。量化预测不确定性对于这些基于微生物组的预测模型在临床环境中的可靠应用至关重要。然而,这种预测模型中的不确定性量化仍未探索。在这项研究中,我们开发了一个概率预测模型,使用高斯过程(GP)的核函数,其中包含微生物群落差异。我们使用公开的人类肠道微生物组数据集,评估了三个回归任务的概率预测性能:实足年龄、体重指数和疾病严重程度。结果表明,我们的模型在概率预测精度方面优于现有方法。此外,我们发现置信水平与经验覆盖率密切匹配,并且不确定性较低的预测数据点对应于较低的预测误差。这些结果表明,结合群落差异的GP回归模型有效地捕捉了系统发育、高维和稀疏微生物丰度数据的特征。我们的研究为基于微生物组的预测提供了一个更可靠的框架,有可能推进微生物组数据在临床环境中健康监测和疾病诊断中的应用。可用性和实现:代码可从https://github.com/asahiadachi/gp4microbiome获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
1.60
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信