BOLDistilled: Automated Construction of Comprehensive but Compact DNA Barcode Reference Libraries.

IF 5.5 1区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY
S W J Prosser, R M Floyd, K A Thompson, S K Monckton, P D N Hebert
{"title":"BOLDistilled: Automated Construction of Comprehensive but Compact DNA Barcode Reference Libraries.","authors":"S W J Prosser, R M Floyd, K A Thompson, S K Monckton, P D N Hebert","doi":"10.1111/1755-0998.70043","DOIUrl":null,"url":null,"abstract":"<p><p>Advances in DNA sequencing technology have stimulated the rapid uptake of protocols-such as eDNA analysis and metabarcoding-that infer the species composition of environmental samples from DNA sequences. DNA barcode reference libraries play a critical role in the interpretation of sequences gathered through such protocols, but many of these libraries lack a taxonomic consensus, include redundant records, do not support end-user analytical pipelines, and are not permanently archived. Furthermore, because DNA sequencers are outpacing Moore's Law and reference libraries are growing, the computational power required to assign sequences to source taxa is rapidly increasing. This paper introduces an algorithmic approach to construct DNA barcode reference libraries that addresses these issues. Hosted online, 'BOLDistilled' libraries are comprehensive but compact, because the algorithm distills genetic variation into a minimal set of records. We provide a BOLDistilled library for the barcode region of the cytochrome c oxidase 1 gene (COI) based on data in the Barcode of Life Data System (BOLD). It contains 1.7 M records versus the 15.7 M in the complete library, a compression that reduced the time required for sequence analysis of metabarcoded samples by ≥ 98% with no reduction in the accuracy of taxonomic placements. BOLDistilled libraries will be updated regularly, with current and previous versions available at https://boldsystems.org/data/boldistilled. By providing access to persistent, comprehensive, and high-quality reference data, these libraries strengthen the capacity of DNA-based identification systems to advance biodiversity science.</p>","PeriodicalId":211,"journal":{"name":"Molecular Ecology Resources","volume":" ","pages":"e70043"},"PeriodicalIF":5.5000,"publicationDate":"2025-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular Ecology Resources","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1111/1755-0998.70043","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Advances in DNA sequencing technology have stimulated the rapid uptake of protocols-such as eDNA analysis and metabarcoding-that infer the species composition of environmental samples from DNA sequences. DNA barcode reference libraries play a critical role in the interpretation of sequences gathered through such protocols, but many of these libraries lack a taxonomic consensus, include redundant records, do not support end-user analytical pipelines, and are not permanently archived. Furthermore, because DNA sequencers are outpacing Moore's Law and reference libraries are growing, the computational power required to assign sequences to source taxa is rapidly increasing. This paper introduces an algorithmic approach to construct DNA barcode reference libraries that addresses these issues. Hosted online, 'BOLDistilled' libraries are comprehensive but compact, because the algorithm distills genetic variation into a minimal set of records. We provide a BOLDistilled library for the barcode region of the cytochrome c oxidase 1 gene (COI) based on data in the Barcode of Life Data System (BOLD). It contains 1.7 M records versus the 15.7 M in the complete library, a compression that reduced the time required for sequence analysis of metabarcoded samples by ≥ 98% with no reduction in the accuracy of taxonomic placements. BOLDistilled libraries will be updated regularly, with current and previous versions available at https://boldsystems.org/data/boldistilled. By providing access to persistent, comprehensive, and high-quality reference data, these libraries strengthen the capacity of DNA-based identification systems to advance biodiversity science.

全面而紧凑的DNA条形码参考文库的自动构建。
DNA测序技术的进步促进了诸如eDNA分析和元条形码等从DNA序列推断环境样本物种组成的方法的迅速普及。DNA条形码参考文库在解释通过此类协议收集的序列方面发挥着关键作用,但许多这些文库缺乏分类共识,包括冗余记录,不支持最终用户分析管道,并且没有永久存档。此外,由于DNA测序仪的发展速度超过了摩尔定律,参考文库也在不断增加,将序列分配给源分类群所需的计算能力也在迅速增加。本文介绍了一种算法方法来构建DNA条形码参考库,以解决这些问题。在线托管的“BOLDistilled”库全面而紧凑,因为该算法将遗传变异提炼成一组最小的记录。基于生命条形码数据系统(BOLD)的数据,建立了细胞色素c氧化酶1基因(COI)条形码区域的BOLDistilled文库。它包含1.7 M条记录,而整个文库为15.7 M,压缩后的元条形码样本序列分析所需的时间减少了≥98%,而分类定位的准确性没有降低。BOLDistilled libraries将定期更新,当前和以前的版本可在https://boldsystems.org/data/boldistilled获得。通过提供持久、全面和高质量的参考数据,这些图书馆加强了基于dna的鉴定系统的能力,促进了生物多样性科学的发展。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Molecular Ecology Resources
Molecular Ecology Resources 生物-进化生物学
CiteScore
15.60
自引率
5.20%
发文量
170
审稿时长
3 months
期刊介绍: Molecular Ecology Resources promotes the creation of comprehensive resources for the scientific community, encompassing computer programs, statistical and molecular advancements, and a diverse array of molecular tools. Serving as a conduit for disseminating these resources, the journal targets a broad audience of researchers in the fields of evolution, ecology, and conservation. Articles in Molecular Ecology Resources are crafted to support investigations tackling significant questions within these disciplines. In addition to original resource articles, Molecular Ecology Resources features Reviews, Opinions, and Comments relevant to the field. The journal also periodically releases Special Issues focusing on resource development within specific areas.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信