gmm降噪:环境DNA扩增子分析中高置信度序列变异滤波的新方法和R包。

IF 5.5 1区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY
Yusuke Koseki, Hirohiko Takeshima, Ryuji Yoneda, Kaito Katayanagi, Gen Ito, Hiroki Yamanaka
{"title":"gmm降噪:环境DNA扩增子分析中高置信度序列变异滤波的新方法和R包。","authors":"Yusuke Koseki, Hirohiko Takeshima, Ryuji Yoneda, Kaito Katayanagi, Gen Ito, Hiroki Yamanaka","doi":"10.1111/1755-0998.70023","DOIUrl":null,"url":null,"abstract":"<p><p>Assessing and monitoring genetic diversity is vital for understanding the ecology and evolution of natural populations but is often challenging in animal and plant species due to technically and physically demanding tissue sampling. Although environmental DNA (eDNA) metabarcoding is a promising alternative to the traditional population genetic monitoring based on biological samples, its practical application remains challenging due to spurious sequences present in the amplicon data, even after data processing with the existing sequence filtering and denoising (error correction) methods. Here we developed a novel amplicon filtering approach that can effectively eliminate such spurious amplicon sequence variants (ASVs) in eDNA metabarcoding data. A simple simulation of eDNA metabarcoding processes was performed to understand the patterns of read count (abundance) distributions of true ASVs and their polymerase chain reaction (PCR)-generated artefacts (i.e., false-positive ASVs). Based on the simulation results, the approach was developed to estimate the abundance distributions of true and false-positive ASVs using Gaussian mixture models and to determine a statistically based threshold between them. The developed approach was implemented as an R package, gmmDenoise and evaluated using single-species metabarcoding datasets in which all or some true ASVs (i.e., haplotypes) were known. Example analyses using community (multi-species) metabarcoding datasets were also performed to demonstrate how gmmDenoise can be used to derive reliable intraspecific diversity estimates and population genetic inferences from noisy amplicon sequencing data. The gmmDenoise package is freely available in the GitHub repository (https://github.com/YSKoseki/gmmDenoise).</p>","PeriodicalId":211,"journal":{"name":"Molecular Ecology Resources","volume":" ","pages":"e70023"},"PeriodicalIF":5.5000,"publicationDate":"2025-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"gmmDenoise: A New Method and R Package for High-Confidence Sequence Variant Filtering in Environmental DNA Amplicon Analysis.\",\"authors\":\"Yusuke Koseki, Hirohiko Takeshima, Ryuji Yoneda, Kaito Katayanagi, Gen Ito, Hiroki Yamanaka\",\"doi\":\"10.1111/1755-0998.70023\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Assessing and monitoring genetic diversity is vital for understanding the ecology and evolution of natural populations but is often challenging in animal and plant species due to technically and physically demanding tissue sampling. Although environmental DNA (eDNA) metabarcoding is a promising alternative to the traditional population genetic monitoring based on biological samples, its practical application remains challenging due to spurious sequences present in the amplicon data, even after data processing with the existing sequence filtering and denoising (error correction) methods. Here we developed a novel amplicon filtering approach that can effectively eliminate such spurious amplicon sequence variants (ASVs) in eDNA metabarcoding data. A simple simulation of eDNA metabarcoding processes was performed to understand the patterns of read count (abundance) distributions of true ASVs and their polymerase chain reaction (PCR)-generated artefacts (i.e., false-positive ASVs). Based on the simulation results, the approach was developed to estimate the abundance distributions of true and false-positive ASVs using Gaussian mixture models and to determine a statistically based threshold between them. The developed approach was implemented as an R package, gmmDenoise and evaluated using single-species metabarcoding datasets in which all or some true ASVs (i.e., haplotypes) were known. Example analyses using community (multi-species) metabarcoding datasets were also performed to demonstrate how gmmDenoise can be used to derive reliable intraspecific diversity estimates and population genetic inferences from noisy amplicon sequencing data. The gmmDenoise package is freely available in the GitHub repository (https://github.com/YSKoseki/gmmDenoise).</p>\",\"PeriodicalId\":211,\"journal\":{\"name\":\"Molecular Ecology Resources\",\"volume\":\" \",\"pages\":\"e70023\"},\"PeriodicalIF\":5.5000,\"publicationDate\":\"2025-08-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Molecular Ecology Resources\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1111/1755-0998.70023\",\"RegionNum\":1,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular Ecology Resources","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1111/1755-0998.70023","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

评估和监测遗传多样性对于了解自然种群的生态和进化至关重要,但由于在技术和物理上要求组织采样,在动植物物种中往往具有挑战性。虽然环境DNA (eDNA)元条形码是传统的基于生物样本的群体遗传监测的一种有前途的替代方案,但由于扩增子数据中存在假序列,即使使用现有的序列滤波和去噪(纠错)方法进行数据处理,其实际应用仍然具有挑战性。在这里,我们开发了一种新的扩增子滤波方法,可以有效地消除eDNA元条形码数据中的伪扩增子序列变异(asv)。对eDNA元条形码过程进行了简单模拟,以了解真asv及其聚合酶链反应(PCR)产生的伪产物(即假阳性asv)的读取计数(丰度)分布模式。基于仿真结果,利用高斯混合模型估计asv的真阳性和假阳性丰度分布,并确定它们之间的统计阈值。开发的方法以R包gmmnoise实现,并使用已知所有或部分真正asv(即单倍型)的单物种元条形码数据集进行评估。使用群落(多物种)元编码数据集的示例分析也进行了演示,以证明如何使用gmmDenoise从有噪声的扩增子测序数据中获得可靠的种内多样性估计和群体遗传推断。gmmnoise包可以在GitHub存储库(https://github.com/YSKoseki/gmmDenoise)中免费获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
gmmDenoise: A New Method and R Package for High-Confidence Sequence Variant Filtering in Environmental DNA Amplicon Analysis.

Assessing and monitoring genetic diversity is vital for understanding the ecology and evolution of natural populations but is often challenging in animal and plant species due to technically and physically demanding tissue sampling. Although environmental DNA (eDNA) metabarcoding is a promising alternative to the traditional population genetic monitoring based on biological samples, its practical application remains challenging due to spurious sequences present in the amplicon data, even after data processing with the existing sequence filtering and denoising (error correction) methods. Here we developed a novel amplicon filtering approach that can effectively eliminate such spurious amplicon sequence variants (ASVs) in eDNA metabarcoding data. A simple simulation of eDNA metabarcoding processes was performed to understand the patterns of read count (abundance) distributions of true ASVs and their polymerase chain reaction (PCR)-generated artefacts (i.e., false-positive ASVs). Based on the simulation results, the approach was developed to estimate the abundance distributions of true and false-positive ASVs using Gaussian mixture models and to determine a statistically based threshold between them. The developed approach was implemented as an R package, gmmDenoise and evaluated using single-species metabarcoding datasets in which all or some true ASVs (i.e., haplotypes) were known. Example analyses using community (multi-species) metabarcoding datasets were also performed to demonstrate how gmmDenoise can be used to derive reliable intraspecific diversity estimates and population genetic inferences from noisy amplicon sequencing data. The gmmDenoise package is freely available in the GitHub repository (https://github.com/YSKoseki/gmmDenoise).

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Molecular Ecology Resources
Molecular Ecology Resources 生物-进化生物学
CiteScore
15.60
自引率
5.20%
发文量
170
审稿时长
3 months
期刊介绍: Molecular Ecology Resources promotes the creation of comprehensive resources for the scientific community, encompassing computer programs, statistical and molecular advancements, and a diverse array of molecular tools. Serving as a conduit for disseminating these resources, the journal targets a broad audience of researchers in the fields of evolution, ecology, and conservation. Articles in Molecular Ecology Resources are crafted to support investigations tackling significant questions within these disciplines. In addition to original resource articles, Molecular Ecology Resources features Reviews, Opinions, and Comments relevant to the field. The journal also periodically releases Special Issues focusing on resource development within specific areas.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信