16S rRNA扩增子测序尚未解决的问题：聚类和去噪方法的基准分析。

IF 5.4 2区环境科学与生态学 Q1 GENETICS & HEREDITY

Environmental Microbiome Pub Date : 2025-05-13 DOI:10.1186/s40793-025-00705-6

Mohamed Fares, Engy K Tharwat, Ilse Cleenwerck, Pieter Monsieurs, Rob Van Houdt, Peter Vandamme, Mohamed El-Hadidi, Mohamed Mysara

{"title":"16S rRNA扩增子测序尚未解决的问题：聚类和去噪方法的基准分析。","authors":"Mohamed Fares, Engy K Tharwat, Ilse Cleenwerck, Pieter Monsieurs, Rob Van Houdt, Peter Vandamme, Mohamed El-Hadidi, Mohamed Mysara","doi":"10.1186/s40793-025-00705-6","DOIUrl":null,"url":null,"abstract":"Background: Although 16S rRNA gene amplicon sequencing has become an indispensable method for microbiome studies, this analysis is not error-free, and remains prone to several biases and errors. Numerous algorithms have been developed to eliminate these errors and consolidate the output into distance-based Operational Taxonomic Units (OTUs) or denoising-based Amplicon Sequence Variants (ASVs). An objective comparison between them has been obscured by various experimental setups and parameters. In the present study, we conducted a comprehensive benchmarking analysis of the error rates, microbial composition, over-merging/over-splitting of reference sequences, and diversity analyses using the most complex mock community, comprising 227 bacterial strains and the Mockrobiota database. Using unified preprocessing steps, we were able to compare DADA2, Deblur, MED, UNOISE3, UPARSE, DGC (Distance-based Greedy Clustering), AN (Average Neighborhood), and Opticlust objectively.Results: ASV algorithms-led by DADA2- resulted in having a consistent output, yet suffered from over-splitting, while OTU algorithms-led by UPARSE-achieved clusters with lower errors, yet with more over-merging. Notably, UPARSE and DADA2 showed the closest resemblance to the intended microbial community, especially when considering measures for alpha and beta diversity.Conclusion: Our unbiased comparative evaluation examined the performance of eight algorithms dedicated to the analysis of 16S rRNA amplicon sequences with a wide range of mock datasets. Our analysis shed light on the pros and cons of each algorithm and the accuracy of the produced OTUs or ASVs. The utilization of the most complex mock community and the benchmarking comparison presented here offer a framework for the comparison between OTU/ASV algorithms and an objective method for the assessment of new tools and algorithms.","PeriodicalId":48553,"journal":{"name":"Environmental Microbiome","volume":"20 1","pages":"51"},"PeriodicalIF":5.4000,"publicationDate":"2025-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12076876/pdf/","citationCount":"0","resultStr":"{\"title\":\"The unresolved struggle of 16S rRNA amplicon sequencing: a benchmarking analysis of clustering and denoising methods.\",\"authors\":\"Mohamed Fares, Engy K Tharwat, Ilse Cleenwerck, Pieter Monsieurs, Rob Van Houdt, Peter Vandamme, Mohamed El-Hadidi, Mohamed Mysara\",\"doi\":\"10.1186/s40793-025-00705-6\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: Although 16S rRNA gene amplicon sequencing has become an indispensable method for microbiome studies, this analysis is not error-free, and remains prone to several biases and errors. Numerous algorithms have been developed to eliminate these errors and consolidate the output into distance-based Operational Taxonomic Units (OTUs) or denoising-based Amplicon Sequence Variants (ASVs). An objective comparison between them has been obscured by various experimental setups and parameters. In the present study, we conducted a comprehensive benchmarking analysis of the error rates, microbial composition, over-merging/over-splitting of reference sequences, and diversity analyses using the most complex mock community, comprising 227 bacterial strains and the Mockrobiota database. Using unified preprocessing steps, we were able to compare DADA2, Deblur, MED, UNOISE3, UPARSE, DGC (Distance-based Greedy Clustering), AN (Average Neighborhood), and Opticlust objectively.Results: ASV algorithms-led by DADA2- resulted in having a consistent output, yet suffered from over-splitting, while OTU algorithms-led by UPARSE-achieved clusters with lower errors, yet with more over-merging. Notably, UPARSE and DADA2 showed the closest resemblance to the intended microbial community, especially when considering measures for alpha and beta diversity.Conclusion: Our unbiased comparative evaluation examined the performance of eight algorithms dedicated to the analysis of 16S rRNA amplicon sequences with a wide range of mock datasets. Our analysis shed light on the pros and cons of each algorithm and the accuracy of the produced OTUs or ASVs. The utilization of the most complex mock community and the benchmarking comparison presented here offer a framework for the comparison between OTU/ASV algorithms and an objective method for the assessment of new tools and algorithms.\",\"PeriodicalId\":48553,\"journal\":{\"name\":\"Environmental Microbiome\",\"volume\":\"20 1\",\"pages\":\"51\"},\"PeriodicalIF\":5.4000,\"publicationDate\":\"2025-05-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12076876/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Environmental Microbiome\",\"FirstCategoryId\":\"93\",\"ListUrlMain\":\"https://doi.org/10.1186/s40793-025-00705-6\",\"RegionNum\":2,\"RegionCategory\":\"环境科学与生态学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"GENETICS & HEREDITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Environmental Microbiome","FirstCategoryId":"93","ListUrlMain":"https://doi.org/10.1186/s40793-025-00705-6","RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}

引用次数: 0

摘要

背景：虽然16S rRNA基因扩增子测序已经成为微生物组研究不可缺少的方法，但这种分析并非没有错误，仍然容易出现一些偏差和错误。已经开发了许多算法来消除这些错误，并将输出整合到基于距离的操作分类单元（OTUs）或基于去噪的扩增子序列变体（asv）中。它们之间的客观比较被各种实验装置和参数所掩盖。在本研究中，我们利用最复杂的模拟群落，包括227个菌株和Mockrobiota数据库，对错误率、微生物组成、参考序列的过度合并/过度分裂和多样性进行了全面的基准分析。使用统一的预处理步骤，我们能够客观地比较DADA2、Deblur、MED、UNOISE3、UPARSE、DGC（基于距离的贪婪聚类）、AN（平均邻域）和opicust。结果：以DADA2为主导的ASV算法获得了一致的输出，但存在过分割的问题，而以upse为主导的OTU算法获得的聚类误差更低，但存在更多的过合并问题。值得注意的是，UPARSE和DADA2显示出与预期微生物群落最接近的相似性，特别是在考虑α和β多样性的测量时。结论：我们通过广泛的模拟数据集，对8种专门用于分析16S rRNA扩增子序列的算法进行了无偏比较评估。我们的分析揭示了每种算法的优缺点以及生成的otu或asv的准确性。本文介绍的最复杂模拟社区的使用和基准比较为OTU/ASV算法之间的比较提供了一个框架，并为评估新工具和算法提供了一种客观方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

The unresolved struggle of 16S rRNA amplicon sequencing: a benchmarking analysis of clustering and denoising methods.

Background: Although 16S rRNA gene amplicon sequencing has become an indispensable method for microbiome studies, this analysis is not error-free, and remains prone to several biases and errors. Numerous algorithms have been developed to eliminate these errors and consolidate the output into distance-based Operational Taxonomic Units (OTUs) or denoising-based Amplicon Sequence Variants (ASVs). An objective comparison between them has been obscured by various experimental setups and parameters. In the present study, we conducted a comprehensive benchmarking analysis of the error rates, microbial composition, over-merging/over-splitting of reference sequences, and diversity analyses using the most complex mock community, comprising 227 bacterial strains and the Mockrobiota database. Using unified preprocessing steps, we were able to compare DADA2, Deblur, MED, UNOISE3, UPARSE, DGC (Distance-based Greedy Clustering), AN (Average Neighborhood), and Opticlust objectively.

Results: ASV algorithms-led by DADA2- resulted in having a consistent output, yet suffered from over-splitting, while OTU algorithms-led by UPARSE-achieved clusters with lower errors, yet with more over-merging. Notably, UPARSE and DADA2 showed the closest resemblance to the intended microbial community, especially when considering measures for alpha and beta diversity.

Conclusion: Our unbiased comparative evaluation examined the performance of eight algorithms dedicated to the analysis of 16S rRNA amplicon sequences with a wide range of mock datasets. Our analysis shed light on the pros and cons of each algorithm and the accuracy of the produced OTUs or ASVs. The utilization of the most complex mock community and the benchmarking comparison presented here offer a framework for the comparison between OTU/ASV algorithms and an objective method for the assessment of new tools and algorithms.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Environmental Microbiome Immunology and Microbiology-Microbiology

CiteScore

7.40

自引率

2.50%

发文量

审稿时长

13 weeks

期刊介绍： Microorganisms, omnipresent across Earth's diverse environments, play a crucial role in adapting to external changes, influencing Earth's systems and cycles, and contributing significantly to agricultural practices. Through applied microbiology, they offer solutions to various everyday needs. Environmental Microbiome recognizes the universal presence and significance of microorganisms, inviting submissions that explore the diverse facets of environmental and applied microbiological research.