人类微生物组研究中差异丰度测试和混杂因素调整的现实基准

IF 10.1 1区 生物学 Q1 BIOTECHNOLOGY & APPLIED MICROBIOLOGY
Jakob Wirbel, Morgan Essex, Sofia Kirke Forslund, Georg Zeller
{"title":"人类微生物组研究中差异丰度测试和混杂因素调整的现实基准","authors":"Jakob Wirbel, Morgan Essex, Sofia Kirke Forslund, Georg Zeller","doi":"10.1186/s13059-024-03390-9","DOIUrl":null,"url":null,"abstract":"In microbiome disease association studies, it is a fundamental task to test which microbes differ in their abundance between groups. Yet, consensus on suitable or optimal statistical methods for differential abundance testing is lacking, and it remains unexplored how these cope with confounding. Previous differential abundance benchmarks relying on simulated datasets did not quantitatively evaluate the similarity to real data, which undermines their recommendations. Our simulation framework implants calibrated signals into real taxonomic profiles, including signals mimicking confounders. Using several whole meta-genome and 16S rRNA gene amplicon datasets, we validate that our simulated data resembles real data from disease association studies much more than in previous benchmarks. With extensively parametrized simulations, we benchmark the performance of nineteen differential abundance methods and further evaluate the best ones on confounded simulations. Only classic statistical methods (linear models, the Wilcoxon test, t-test), limma, and fastANCOM properly control false discoveries at relatively high sensitivity. When additionally considering confounders, these issues are exacerbated, but we find that adjusted differential abundance testing can effectively mitigate them. In a large cardiometabolic disease dataset, we showcase that failure to account for covariates such as medication causes spurious association in real-world applications. Tight error control is critical for microbiome association studies. The unsatisfactory performance of many differential abundance methods and the persistent danger of unchecked confounding suggest these contribute to a lack of reproducibility among such studies. We have open-sourced our simulation and benchmarking software to foster a much-needed consolidation of statistical methodology for microbiome research.\n","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"37 1","pages":""},"PeriodicalIF":10.1000,"publicationDate":"2024-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A realistic benchmark for differential abundance testing and confounder adjustment in human microbiome studies\",\"authors\":\"Jakob Wirbel, Morgan Essex, Sofia Kirke Forslund, Georg Zeller\",\"doi\":\"10.1186/s13059-024-03390-9\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In microbiome disease association studies, it is a fundamental task to test which microbes differ in their abundance between groups. Yet, consensus on suitable or optimal statistical methods for differential abundance testing is lacking, and it remains unexplored how these cope with confounding. Previous differential abundance benchmarks relying on simulated datasets did not quantitatively evaluate the similarity to real data, which undermines their recommendations. Our simulation framework implants calibrated signals into real taxonomic profiles, including signals mimicking confounders. Using several whole meta-genome and 16S rRNA gene amplicon datasets, we validate that our simulated data resembles real data from disease association studies much more than in previous benchmarks. With extensively parametrized simulations, we benchmark the performance of nineteen differential abundance methods and further evaluate the best ones on confounded simulations. Only classic statistical methods (linear models, the Wilcoxon test, t-test), limma, and fastANCOM properly control false discoveries at relatively high sensitivity. When additionally considering confounders, these issues are exacerbated, but we find that adjusted differential abundance testing can effectively mitigate them. In a large cardiometabolic disease dataset, we showcase that failure to account for covariates such as medication causes spurious association in real-world applications. Tight error control is critical for microbiome association studies. The unsatisfactory performance of many differential abundance methods and the persistent danger of unchecked confounding suggest these contribute to a lack of reproducibility among such studies. We have open-sourced our simulation and benchmarking software to foster a much-needed consolidation of statistical methodology for microbiome research.\\n\",\"PeriodicalId\":12611,\"journal\":{\"name\":\"Genome Biology\",\"volume\":\"37 1\",\"pages\":\"\"},\"PeriodicalIF\":10.1000,\"publicationDate\":\"2024-09-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Genome Biology\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1186/s13059-024-03390-9\",\"RegionNum\":1,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOTECHNOLOGY & APPLIED MICROBIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genome Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s13059-024-03390-9","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOTECHNOLOGY & APPLIED MICROBIOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

在微生物组疾病关联研究中,一项基本任务是测试不同组别之间哪些微生物的丰度不同。然而,对于差异丰度测试的合适或最佳统计方法还缺乏共识,这些方法如何应对混杂因素也仍未得到探讨。以前的差异丰度基准依赖于模拟数据集,但并未对其与真实数据的相似性进行定量评估,这削弱了它们的建议。我们的模拟框架将经过校准的信号植入真实的分类概况中,包括模拟混杂因素的信号。通过使用几个全元基因组和 16S rRNA 基因扩增片段数据集,我们验证了我们的模拟数据与疾病关联研究的真实数据的相似性远高于之前的基准数据。通过广泛的参数化模拟,我们对 19 种差异丰度方法的性能进行了基准测试,并进一步评估了混杂模拟中的最佳方法。只有经典统计方法(线性模型、Wilcoxon 检验、t 检验)、limma 和 fastANCOM 能以相对较高的灵敏度正确控制错误发现。当额外考虑混杂因素时,这些问题就会加剧,但我们发现调整后的差异丰度检验可以有效缓解这些问题。在一个大型心脏代谢疾病数据集中,我们展示了在实际应用中,如果不考虑药物治疗等协变量,就会导致虚假关联。严格的误差控制对微生物组关联研究至关重要。许多差异丰度方法的性能不尽如人意,而且未考虑混杂因素的危险持续存在,这些都是导致此类研究缺乏可重复性的原因。我们已经开源了我们的模拟和基准测试软件,以促进微生物组研究统计方法的整合。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A realistic benchmark for differential abundance testing and confounder adjustment in human microbiome studies
In microbiome disease association studies, it is a fundamental task to test which microbes differ in their abundance between groups. Yet, consensus on suitable or optimal statistical methods for differential abundance testing is lacking, and it remains unexplored how these cope with confounding. Previous differential abundance benchmarks relying on simulated datasets did not quantitatively evaluate the similarity to real data, which undermines their recommendations. Our simulation framework implants calibrated signals into real taxonomic profiles, including signals mimicking confounders. Using several whole meta-genome and 16S rRNA gene amplicon datasets, we validate that our simulated data resembles real data from disease association studies much more than in previous benchmarks. With extensively parametrized simulations, we benchmark the performance of nineteen differential abundance methods and further evaluate the best ones on confounded simulations. Only classic statistical methods (linear models, the Wilcoxon test, t-test), limma, and fastANCOM properly control false discoveries at relatively high sensitivity. When additionally considering confounders, these issues are exacerbated, but we find that adjusted differential abundance testing can effectively mitigate them. In a large cardiometabolic disease dataset, we showcase that failure to account for covariates such as medication causes spurious association in real-world applications. Tight error control is critical for microbiome association studies. The unsatisfactory performance of many differential abundance methods and the persistent danger of unchecked confounding suggest these contribute to a lack of reproducibility among such studies. We have open-sourced our simulation and benchmarking software to foster a much-needed consolidation of statistical methodology for microbiome research.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Genome Biology
Genome Biology Biochemistry, Genetics and Molecular Biology-Genetics
CiteScore
21.00
自引率
3.30%
发文量
241
审稿时长
2 months
期刊介绍: Genome Biology stands as a premier platform for exceptional research across all domains of biology and biomedicine, explored through a genomic and post-genomic lens. With an impressive impact factor of 12.3 (2022),* the journal secures its position as the 3rd-ranked research journal in the Genetics and Heredity category and the 2nd-ranked research journal in the Biotechnology and Applied Microbiology category by Thomson Reuters. Notably, Genome Biology holds the distinction of being the highest-ranked open-access journal in this category. Our dedicated team of highly trained in-house Editors collaborates closely with our esteemed Editorial Board of international experts, ensuring the journal remains on the forefront of scientific advances and community standards. Regular engagement with researchers at conferences and institute visits underscores our commitment to staying abreast of the latest developments in the field.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信