Philipp E Bayer, Adam Bennett, Georgia Nester, Shannon Corrigan, Eric J Raes, Madalyn Cooper, Marcelle E Ayad, Philip McVey, Anya Kardailsky, Jessica Pearce, Matthew W Fraser, Priscila Goncalves, Stephen Burnell, Sebastian Rauschert
{"title":"A Comprehensive Evaluation of Taxonomic Classifiers in Marine Vertebrate eDNA Studies.","authors":"Philipp E Bayer, Adam Bennett, Georgia Nester, Shannon Corrigan, Eric J Raes, Madalyn Cooper, Marcelle E Ayad, Philip McVey, Anya Kardailsky, Jessica Pearce, Matthew W Fraser, Priscila Goncalves, Stephen Burnell, Sebastian Rauschert","doi":"10.1111/1755-0998.14107","DOIUrl":null,"url":null,"abstract":"<p><p>Environmental DNA (eDNA) metabarcoding is a widely used tool for surveying marine vertebrate biodiversity. To this end, many computational tools have been released and a plethora of bioinformatic approaches are used for eDNA-based community composition analysis. Simulation studies and careful evaluation of taxonomic classifiers are essential to establish reliable benchmarks to improve the accuracy and reproducibility of eDNA-based findings. Here we present a comprehensive evaluation of nine taxonomic classifiers exploring three widely used mitochondrial markers (12S rDNA, 16S rDNA and COI) in Australian marine vertebrates. Curated reference databases and exclusion database tests were used to simulate diverse species compositions, including three positive control and two negative control datasets. Using these simulated datasets ranging from 36 to 302 marker genes, we were able to identify between 19% and 89% of marine vertebrate species using mitochondrial markers. We show that MMSeqs2 and Metabuli generally outperform BLAST with 10% and 11% higher F1 scores for 12S and 16S rDNA markers, respectively, and that Naive Bayes Classifiers such as Mothur outperform sequence-based classifiers except MMSeqs2 for COI markers by 11%. Database exclusion tests reveal that MMSeqs2 and BLAST are less susceptible to false positives compared to Kraken2 with default parameters. Based on these findings, we recommend that MMSeqs2 is used for taxonomic classification of marine vertebrates given its ability to improve species-level assignments while reducing the number of false positives. Our work contributes to the establishment of best practices in eDNA-based biodiversity analysis to ultimately increase the reliability of this monitoring tool in the context of marine vertebrate conservation.</p>","PeriodicalId":211,"journal":{"name":"Molecular Ecology Resources","volume":" ","pages":"e14107"},"PeriodicalIF":5.5000,"publicationDate":"2025-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular Ecology Resources","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1111/1755-0998.14107","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Environmental DNA (eDNA) metabarcoding is a widely used tool for surveying marine vertebrate biodiversity. To this end, many computational tools have been released and a plethora of bioinformatic approaches are used for eDNA-based community composition analysis. Simulation studies and careful evaluation of taxonomic classifiers are essential to establish reliable benchmarks to improve the accuracy and reproducibility of eDNA-based findings. Here we present a comprehensive evaluation of nine taxonomic classifiers exploring three widely used mitochondrial markers (12S rDNA, 16S rDNA and COI) in Australian marine vertebrates. Curated reference databases and exclusion database tests were used to simulate diverse species compositions, including three positive control and two negative control datasets. Using these simulated datasets ranging from 36 to 302 marker genes, we were able to identify between 19% and 89% of marine vertebrate species using mitochondrial markers. We show that MMSeqs2 and Metabuli generally outperform BLAST with 10% and 11% higher F1 scores for 12S and 16S rDNA markers, respectively, and that Naive Bayes Classifiers such as Mothur outperform sequence-based classifiers except MMSeqs2 for COI markers by 11%. Database exclusion tests reveal that MMSeqs2 and BLAST are less susceptible to false positives compared to Kraken2 with default parameters. Based on these findings, we recommend that MMSeqs2 is used for taxonomic classification of marine vertebrates given its ability to improve species-level assignments while reducing the number of false positives. Our work contributes to the establishment of best practices in eDNA-based biodiversity analysis to ultimately increase the reliability of this monitoring tool in the context of marine vertebrate conservation.
期刊介绍:
Molecular Ecology Resources promotes the creation of comprehensive resources for the scientific community, encompassing computer programs, statistical and molecular advancements, and a diverse array of molecular tools. Serving as a conduit for disseminating these resources, the journal targets a broad audience of researchers in the fields of evolution, ecology, and conservation. Articles in Molecular Ecology Resources are crafted to support investigations tackling significant questions within these disciplines.
In addition to original resource articles, Molecular Ecology Resources features Reviews, Opinions, and Comments relevant to the field. The journal also periodically releases Special Issues focusing on resource development within specific areas.