Mohamed Fares, Engy K Tharwat, Ilse Cleenwerck, Pieter Monsieurs, Rob Van Houdt, Peter Vandamme, Mohamed El-Hadidi, Mohamed Mysara
{"title":"16S rRNA扩增子测序尚未解决的问题:聚类和去噪方法的基准分析。","authors":"Mohamed Fares, Engy K Tharwat, Ilse Cleenwerck, Pieter Monsieurs, Rob Van Houdt, Peter Vandamme, Mohamed El-Hadidi, Mohamed Mysara","doi":"10.1186/s40793-025-00705-6","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Although 16S rRNA gene amplicon sequencing has become an indispensable method for microbiome studies, this analysis is not error-free, and remains prone to several biases and errors. Numerous algorithms have been developed to eliminate these errors and consolidate the output into distance-based Operational Taxonomic Units (OTUs) or denoising-based Amplicon Sequence Variants (ASVs). An objective comparison between them has been obscured by various experimental setups and parameters. In the present study, we conducted a comprehensive benchmarking analysis of the error rates, microbial composition, over-merging/over-splitting of reference sequences, and diversity analyses using the most complex mock community, comprising 227 bacterial strains and the Mockrobiota database. Using unified preprocessing steps, we were able to compare DADA2, Deblur, MED, UNOISE3, UPARSE, DGC (Distance-based Greedy Clustering), AN (Average Neighborhood), and Opticlust objectively.</p><p><strong>Results: </strong>ASV algorithms-led by DADA2- resulted in having a consistent output, yet suffered from over-splitting, while OTU algorithms-led by UPARSE-achieved clusters with lower errors, yet with more over-merging. Notably, UPARSE and DADA2 showed the closest resemblance to the intended microbial community, especially when considering measures for alpha and beta diversity.</p><p><strong>Conclusion: </strong>Our unbiased comparative evaluation examined the performance of eight algorithms dedicated to the analysis of 16S rRNA amplicon sequences with a wide range of mock datasets. Our analysis shed light on the pros and cons of each algorithm and the accuracy of the produced OTUs or ASVs. The utilization of the most complex mock community and the benchmarking comparison presented here offer a framework for the comparison between OTU/ASV algorithms and an objective method for the assessment of new tools and algorithms.</p>","PeriodicalId":48553,"journal":{"name":"Environmental Microbiome","volume":"20 1","pages":"51"},"PeriodicalIF":6.2000,"publicationDate":"2025-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"The unresolved struggle of 16S rRNA amplicon sequencing: a benchmarking analysis of clustering and denoising methods.\",\"authors\":\"Mohamed Fares, Engy K Tharwat, Ilse Cleenwerck, Pieter Monsieurs, Rob Van Houdt, Peter Vandamme, Mohamed El-Hadidi, Mohamed Mysara\",\"doi\":\"10.1186/s40793-025-00705-6\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Although 16S rRNA gene amplicon sequencing has become an indispensable method for microbiome studies, this analysis is not error-free, and remains prone to several biases and errors. Numerous algorithms have been developed to eliminate these errors and consolidate the output into distance-based Operational Taxonomic Units (OTUs) or denoising-based Amplicon Sequence Variants (ASVs). An objective comparison between them has been obscured by various experimental setups and parameters. In the present study, we conducted a comprehensive benchmarking analysis of the error rates, microbial composition, over-merging/over-splitting of reference sequences, and diversity analyses using the most complex mock community, comprising 227 bacterial strains and the Mockrobiota database. Using unified preprocessing steps, we were able to compare DADA2, Deblur, MED, UNOISE3, UPARSE, DGC (Distance-based Greedy Clustering), AN (Average Neighborhood), and Opticlust objectively.</p><p><strong>Results: </strong>ASV algorithms-led by DADA2- resulted in having a consistent output, yet suffered from over-splitting, while OTU algorithms-led by UPARSE-achieved clusters with lower errors, yet with more over-merging. Notably, UPARSE and DADA2 showed the closest resemblance to the intended microbial community, especially when considering measures for alpha and beta diversity.</p><p><strong>Conclusion: </strong>Our unbiased comparative evaluation examined the performance of eight algorithms dedicated to the analysis of 16S rRNA amplicon sequences with a wide range of mock datasets. Our analysis shed light on the pros and cons of each algorithm and the accuracy of the produced OTUs or ASVs. The utilization of the most complex mock community and the benchmarking comparison presented here offer a framework for the comparison between OTU/ASV algorithms and an objective method for the assessment of new tools and algorithms.</p>\",\"PeriodicalId\":48553,\"journal\":{\"name\":\"Environmental Microbiome\",\"volume\":\"20 1\",\"pages\":\"51\"},\"PeriodicalIF\":6.2000,\"publicationDate\":\"2025-05-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Environmental Microbiome\",\"FirstCategoryId\":\"93\",\"ListUrlMain\":\"https://doi.org/10.1186/s40793-025-00705-6\",\"RegionNum\":2,\"RegionCategory\":\"环境科学与生态学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"GENETICS & HEREDITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Environmental Microbiome","FirstCategoryId":"93","ListUrlMain":"https://doi.org/10.1186/s40793-025-00705-6","RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
The unresolved struggle of 16S rRNA amplicon sequencing: a benchmarking analysis of clustering and denoising methods.
Background: Although 16S rRNA gene amplicon sequencing has become an indispensable method for microbiome studies, this analysis is not error-free, and remains prone to several biases and errors. Numerous algorithms have been developed to eliminate these errors and consolidate the output into distance-based Operational Taxonomic Units (OTUs) or denoising-based Amplicon Sequence Variants (ASVs). An objective comparison between them has been obscured by various experimental setups and parameters. In the present study, we conducted a comprehensive benchmarking analysis of the error rates, microbial composition, over-merging/over-splitting of reference sequences, and diversity analyses using the most complex mock community, comprising 227 bacterial strains and the Mockrobiota database. Using unified preprocessing steps, we were able to compare DADA2, Deblur, MED, UNOISE3, UPARSE, DGC (Distance-based Greedy Clustering), AN (Average Neighborhood), and Opticlust objectively.
Results: ASV algorithms-led by DADA2- resulted in having a consistent output, yet suffered from over-splitting, while OTU algorithms-led by UPARSE-achieved clusters with lower errors, yet with more over-merging. Notably, UPARSE and DADA2 showed the closest resemblance to the intended microbial community, especially when considering measures for alpha and beta diversity.
Conclusion: Our unbiased comparative evaluation examined the performance of eight algorithms dedicated to the analysis of 16S rRNA amplicon sequences with a wide range of mock datasets. Our analysis shed light on the pros and cons of each algorithm and the accuracy of the produced OTUs or ASVs. The utilization of the most complex mock community and the benchmarking comparison presented here offer a framework for the comparison between OTU/ASV algorithms and an objective method for the assessment of new tools and algorithms.
期刊介绍:
Microorganisms, omnipresent across Earth's diverse environments, play a crucial role in adapting to external changes, influencing Earth's systems and cycles, and contributing significantly to agricultural practices. Through applied microbiology, they offer solutions to various everyday needs. Environmental Microbiome recognizes the universal presence and significance of microorganisms, inviting submissions that explore the diverse facets of environmental and applied microbiological research.