Maximizing the reliability and the number of species assignments in metabarcoding studies using a curated regional library and a public repository

Bourret, Audrey, Nozères, Claude, Parent, Eric, Parent, Geneviève J.
{"title":"Maximizing the reliability and the number of species assignments in metabarcoding studies using a curated regional library and a public repository","authors":"Bourret, Audrey, Nozères, Claude, Parent, Eric, Parent, Geneviève J.","doi":"10.3897/mbmg.7.98539","DOIUrl":null,"url":null,"abstract":"Biodiversity assessments relying on DNA have increased rapidly over the last decade. However, the reliability of taxonomic assignments in metabarcoding studies is variable and affected by the reference databases and the assignment methods used. Species level assignments are usually considered as reliable using regional libraries but unreliable using public repositories. In this study, we aimed to test this assumption for metazoan species detected in the Gulf of St. Lawrence in the Northwest Atlantic. We first created a regional library (GSL-rl) by data mining COI barcode sequences from BOLD, and included a reliability ranking system for species assignments. We then estimated 1) the accuracy and precision of the public repository NCBI-nt for species assignments using sequences from the regional library and 2) compared the detection and reliability of species assignments of a metabarcoding dataset using either NCBI-nt or the regional library and popular assignment methods. With NCBI-nt and sequences from the regional library, the BLAST-LCA (least common ancestor) method was the most precise method for species assignments, but the accuracy was higher with the BLAST-TopHit method (>80% over all taxa, between 70% and 90% amongst taxonomic groups). With the metabarcoding dataset, the reliability of species assignments was greater using GSL-rl compared to NCBI-nt. However, we also observed that the total number of reliable species assignments could be maximized using both GSL-rl and NCBI-nt with different optimized assignment methods. The use of a two-step approach for species assignments, i.e., using a regional library and a public repository, could improve the reliability and the number of detected species in metabarcoding studies.","PeriodicalId":18374,"journal":{"name":"Metabarcoding and Metagenomics","volume":"60 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Metabarcoding and Metagenomics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3897/mbmg.7.98539","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Biodiversity assessments relying on DNA have increased rapidly over the last decade. However, the reliability of taxonomic assignments in metabarcoding studies is variable and affected by the reference databases and the assignment methods used. Species level assignments are usually considered as reliable using regional libraries but unreliable using public repositories. In this study, we aimed to test this assumption for metazoan species detected in the Gulf of St. Lawrence in the Northwest Atlantic. We first created a regional library (GSL-rl) by data mining COI barcode sequences from BOLD, and included a reliability ranking system for species assignments. We then estimated 1) the accuracy and precision of the public repository NCBI-nt for species assignments using sequences from the regional library and 2) compared the detection and reliability of species assignments of a metabarcoding dataset using either NCBI-nt or the regional library and popular assignment methods. With NCBI-nt and sequences from the regional library, the BLAST-LCA (least common ancestor) method was the most precise method for species assignments, but the accuracy was higher with the BLAST-TopHit method (>80% over all taxa, between 70% and 90% amongst taxonomic groups). With the metabarcoding dataset, the reliability of species assignments was greater using GSL-rl compared to NCBI-nt. However, we also observed that the total number of reliable species assignments could be maximized using both GSL-rl and NCBI-nt with different optimized assignment methods. The use of a two-step approach for species assignments, i.e., using a regional library and a public repository, could improve the reliability and the number of detected species in metabarcoding studies.
Maximizing使用区域图书馆和公共资源库进行元条形码研究的物种分配的可靠性和数量
过去十年来,依赖DNA的生物多样性评估迅速增加。然而,元条形码研究中分类分配的可靠性是可变的,并且受参考数据库和使用的分配方法的影响。物种水平的分配通常被认为是可靠的使用区域图书馆,但不可靠的使用公共库。在这项研究中,我们的目的是在西北大西洋的圣劳伦斯湾检测到的后生动物物种中验证这一假设。我们首先通过对BOLD中COI条形码序列的数据挖掘创建了一个区域库(GSL-rl),并包含了一个物种分配的可靠性排序系统。然后,我们估计了1)公共数据库NCBI-nt使用来自区域库的序列进行物种分配的准确性和精密度;2)比较了使用NCBI-nt或区域库和流行的分配方法进行元条形码数据集物种分配的检测和可靠性。利用NCBI-nt和区域文库序列,BLAST-LCA (least common ancestor)方法是最精确的物种分配方法,但BLAST-TopHit方法的准确率更高(在所有分类群中为80%,在分类群中为70% ~ 90%)。与NCBI-nt相比,在元条形码数据集上,GSL-rl的物种分配可靠性更高。然而,我们也观察到,GSL-rl和NCBI-nt采用不同的优化分配方法都能最大限度地获得可靠的物种分配。采用区域文库和公共文库两步法进行物种分配,可以提高元条形码研究的可靠性和检测物种的数量。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Metabarcoding and Metagenomics
Metabarcoding and Metagenomics Agricultural and Biological Sciences-Animal Science and Zoology
CiteScore
5.40
自引率
0.00%
发文量
25
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信