{"title":"A Targeted Reference Database for Improved Analysis of Environmental 16S rRNA Oxford Nanopore Sequencing Data.","authors":"Melcy Philip, Tonje Nilsen, Sanna Majaneva, Ragnhild Pettersen, Morten Stokkan, Jessica Louise Ray, Nigel Keeley, Knut Rudi, Lars-Gustav Snipen","doi":"10.1111/1755-0998.70036","DOIUrl":null,"url":null,"abstract":"<p><p>The Oxford Nanopore Technologies (ONT) sequencing platform is compact and efficient, making it suitable for rapid biodiversity assessments in remote areas. Despite its long reads, ONT has a higher error rate compared to other platforms; necessitating high-quality reference databases for accurate taxonomic assignments. However, the absence of targeted databases for underexplored habitats, such as the seafloor, limits ONT's broader applicability for exploratory analysis. To address this, we propose an approach for building environmentally targeted databases to improve 16S rRNA gene (16S) analysis using Oxford Nanopore Technologies (ONT), using seafloor sediment samples from the Norwegian coast as an example. We started by using Illumina short-read data to create a database of full-length or near full-length 16S sequences from seafloor samples. Initially, amplicons are mapped to the SILVA database, with matches added to our database. Unmatched amplicons are reconstructed using METASEED and Barrnap methodologies with amplicon and metagenome data. Finally, if the previous strategies did not succeed, we included the short-read sequences in the database. This resulted in AQUAeD-DB, which contains 14,545 16S sequences clustered at 95% identity. Comparative database analysis reveals that AQUAeD-DB provides consistent results for both Illumina and Nanopore read assignments (median correlation coefficient: 0.50), whereas a standard database showed a substantially weaker correlation. These findings also emphasise its potential to recognise both high and low abundance taxa, which could be key indicators in environmental studies. This work highlights the necessity of targeted databases for environmental analysis, especially for ONT-based studies, and lays the foundations for future extension of the database.</p>","PeriodicalId":211,"journal":{"name":"Molecular Ecology Resources","volume":" ","pages":"e70036"},"PeriodicalIF":5.5000,"publicationDate":"2025-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular Ecology Resources","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1111/1755-0998.70036","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
The Oxford Nanopore Technologies (ONT) sequencing platform is compact and efficient, making it suitable for rapid biodiversity assessments in remote areas. Despite its long reads, ONT has a higher error rate compared to other platforms; necessitating high-quality reference databases for accurate taxonomic assignments. However, the absence of targeted databases for underexplored habitats, such as the seafloor, limits ONT's broader applicability for exploratory analysis. To address this, we propose an approach for building environmentally targeted databases to improve 16S rRNA gene (16S) analysis using Oxford Nanopore Technologies (ONT), using seafloor sediment samples from the Norwegian coast as an example. We started by using Illumina short-read data to create a database of full-length or near full-length 16S sequences from seafloor samples. Initially, amplicons are mapped to the SILVA database, with matches added to our database. Unmatched amplicons are reconstructed using METASEED and Barrnap methodologies with amplicon and metagenome data. Finally, if the previous strategies did not succeed, we included the short-read sequences in the database. This resulted in AQUAeD-DB, which contains 14,545 16S sequences clustered at 95% identity. Comparative database analysis reveals that AQUAeD-DB provides consistent results for both Illumina and Nanopore read assignments (median correlation coefficient: 0.50), whereas a standard database showed a substantially weaker correlation. These findings also emphasise its potential to recognise both high and low abundance taxa, which could be key indicators in environmental studies. This work highlights the necessity of targeted databases for environmental analysis, especially for ONT-based studies, and lays the foundations for future extension of the database.
期刊介绍:
Molecular Ecology Resources promotes the creation of comprehensive resources for the scientific community, encompassing computer programs, statistical and molecular advancements, and a diverse array of molecular tools. Serving as a conduit for disseminating these resources, the journal targets a broad audience of researchers in the fields of evolution, ecology, and conservation. Articles in Molecular Ecology Resources are crafted to support investigations tackling significant questions within these disciplines.
In addition to original resource articles, Molecular Ecology Resources features Reviews, Opinions, and Comments relevant to the field. The journal also periodically releases Special Issues focusing on resource development within specific areas.