{"title":"gmmDenoise: A New Method and R Package for High-Confidence Sequence Variant Filtering in Environmental DNA Amplicon Analysis.","authors":"Yusuke Koseki, Hirohiko Takeshima, Ryuji Yoneda, Kaito Katayanagi, Gen Ito, Hiroki Yamanaka","doi":"10.1111/1755-0998.70023","DOIUrl":null,"url":null,"abstract":"<p><p>Assessing and monitoring genetic diversity is vital for understanding the ecology and evolution of natural populations but is often challenging in animal and plant species due to technically and physically demanding tissue sampling. Although environmental DNA (eDNA) metabarcoding is a promising alternative to the traditional population genetic monitoring based on biological samples, its practical application remains challenging due to spurious sequences present in the amplicon data, even after data processing with the existing sequence filtering and denoising (error correction) methods. Here we developed a novel amplicon filtering approach that can effectively eliminate such spurious amplicon sequence variants (ASVs) in eDNA metabarcoding data. A simple simulation of eDNA metabarcoding processes was performed to understand the patterns of read count (abundance) distributions of true ASVs and their polymerase chain reaction (PCR)-generated artefacts (i.e., false-positive ASVs). Based on the simulation results, the approach was developed to estimate the abundance distributions of true and false-positive ASVs using Gaussian mixture models and to determine a statistically based threshold between them. The developed approach was implemented as an R package, gmmDenoise and evaluated using single-species metabarcoding datasets in which all or some true ASVs (i.e., haplotypes) were known. Example analyses using community (multi-species) metabarcoding datasets were also performed to demonstrate how gmmDenoise can be used to derive reliable intraspecific diversity estimates and population genetic inferences from noisy amplicon sequencing data. The gmmDenoise package is freely available in the GitHub repository (https://github.com/YSKoseki/gmmDenoise).</p>","PeriodicalId":211,"journal":{"name":"Molecular Ecology Resources","volume":" ","pages":"e70023"},"PeriodicalIF":5.5000,"publicationDate":"2025-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular Ecology Resources","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1111/1755-0998.70023","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Assessing and monitoring genetic diversity is vital for understanding the ecology and evolution of natural populations but is often challenging in animal and plant species due to technically and physically demanding tissue sampling. Although environmental DNA (eDNA) metabarcoding is a promising alternative to the traditional population genetic monitoring based on biological samples, its practical application remains challenging due to spurious sequences present in the amplicon data, even after data processing with the existing sequence filtering and denoising (error correction) methods. Here we developed a novel amplicon filtering approach that can effectively eliminate such spurious amplicon sequence variants (ASVs) in eDNA metabarcoding data. A simple simulation of eDNA metabarcoding processes was performed to understand the patterns of read count (abundance) distributions of true ASVs and their polymerase chain reaction (PCR)-generated artefacts (i.e., false-positive ASVs). Based on the simulation results, the approach was developed to estimate the abundance distributions of true and false-positive ASVs using Gaussian mixture models and to determine a statistically based threshold between them. The developed approach was implemented as an R package, gmmDenoise and evaluated using single-species metabarcoding datasets in which all or some true ASVs (i.e., haplotypes) were known. Example analyses using community (multi-species) metabarcoding datasets were also performed to demonstrate how gmmDenoise can be used to derive reliable intraspecific diversity estimates and population genetic inferences from noisy amplicon sequencing data. The gmmDenoise package is freely available in the GitHub repository (https://github.com/YSKoseki/gmmDenoise).
期刊介绍:
Molecular Ecology Resources promotes the creation of comprehensive resources for the scientific community, encompassing computer programs, statistical and molecular advancements, and a diverse array of molecular tools. Serving as a conduit for disseminating these resources, the journal targets a broad audience of researchers in the fields of evolution, ecology, and conservation. Articles in Molecular Ecology Resources are crafted to support investigations tackling significant questions within these disciplines.
In addition to original resource articles, Molecular Ecology Resources features Reviews, Opinions, and Comments relevant to the field. The journal also periodically releases Special Issues focusing on resource development within specific areas.