S W J Prosser, R M Floyd, K A Thompson, S K Monckton, P D N Hebert
{"title":"BOLDistilled: Automated Construction of Comprehensive but Compact DNA Barcode Reference Libraries.","authors":"S W J Prosser, R M Floyd, K A Thompson, S K Monckton, P D N Hebert","doi":"10.1111/1755-0998.70043","DOIUrl":null,"url":null,"abstract":"<p><p>Advances in DNA sequencing technology have stimulated the rapid uptake of protocols-such as eDNA analysis and metabarcoding-that infer the species composition of environmental samples from DNA sequences. DNA barcode reference libraries play a critical role in the interpretation of sequences gathered through such protocols, but many of these libraries lack a taxonomic consensus, include redundant records, do not support end-user analytical pipelines, and are not permanently archived. Furthermore, because DNA sequencers are outpacing Moore's Law and reference libraries are growing, the computational power required to assign sequences to source taxa is rapidly increasing. This paper introduces an algorithmic approach to construct DNA barcode reference libraries that addresses these issues. Hosted online, 'BOLDistilled' libraries are comprehensive but compact, because the algorithm distills genetic variation into a minimal set of records. We provide a BOLDistilled library for the barcode region of the cytochrome c oxidase 1 gene (COI) based on data in the Barcode of Life Data System (BOLD). It contains 1.7 M records versus the 15.7 M in the complete library, a compression that reduced the time required for sequence analysis of metabarcoded samples by ≥ 98% with no reduction in the accuracy of taxonomic placements. BOLDistilled libraries will be updated regularly, with current and previous versions available at https://boldsystems.org/data/boldistilled. By providing access to persistent, comprehensive, and high-quality reference data, these libraries strengthen the capacity of DNA-based identification systems to advance biodiversity science.</p>","PeriodicalId":211,"journal":{"name":"Molecular Ecology Resources","volume":" ","pages":"e70043"},"PeriodicalIF":5.5000,"publicationDate":"2025-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular Ecology Resources","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1111/1755-0998.70043","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Advances in DNA sequencing technology have stimulated the rapid uptake of protocols-such as eDNA analysis and metabarcoding-that infer the species composition of environmental samples from DNA sequences. DNA barcode reference libraries play a critical role in the interpretation of sequences gathered through such protocols, but many of these libraries lack a taxonomic consensus, include redundant records, do not support end-user analytical pipelines, and are not permanently archived. Furthermore, because DNA sequencers are outpacing Moore's Law and reference libraries are growing, the computational power required to assign sequences to source taxa is rapidly increasing. This paper introduces an algorithmic approach to construct DNA barcode reference libraries that addresses these issues. Hosted online, 'BOLDistilled' libraries are comprehensive but compact, because the algorithm distills genetic variation into a minimal set of records. We provide a BOLDistilled library for the barcode region of the cytochrome c oxidase 1 gene (COI) based on data in the Barcode of Life Data System (BOLD). It contains 1.7 M records versus the 15.7 M in the complete library, a compression that reduced the time required for sequence analysis of metabarcoded samples by ≥ 98% with no reduction in the accuracy of taxonomic placements. BOLDistilled libraries will be updated regularly, with current and previous versions available at https://boldsystems.org/data/boldistilled. By providing access to persistent, comprehensive, and high-quality reference data, these libraries strengthen the capacity of DNA-based identification systems to advance biodiversity science.
期刊介绍:
Molecular Ecology Resources promotes the creation of comprehensive resources for the scientific community, encompassing computer programs, statistical and molecular advancements, and a diverse array of molecular tools. Serving as a conduit for disseminating these resources, the journal targets a broad audience of researchers in the fields of evolution, ecology, and conservation. Articles in Molecular Ecology Resources are crafted to support investigations tackling significant questions within these disciplines.
In addition to original resource articles, Molecular Ecology Resources features Reviews, Opinions, and Comments relevant to the field. The journal also periodically releases Special Issues focusing on resource development within specific areas.