Taxonomy Identifiers (TaxId) for Biodiversity Genomics: a guide to getting TaxId for submission of data to public databases.

Q1 Medicine

Wellcome Open Research Pub Date : 2024-10-15 eCollection Date: 2024-01-01 DOI:10.12688/wellcomeopenres.22949.1

Mark Blaxter, Joana Pauperio, Conrad Schoch, Kerstin Howe

{"title":"Taxonomy Identifiers (TaxId) for Biodiversity Genomics: a guide to getting TaxId for submission of data to public databases.","authors":"Mark Blaxter, Joana Pauperio, Conrad Schoch, Kerstin Howe","doi":"10.12688/wellcomeopenres.22949.1","DOIUrl":null,"url":null,"abstract":"<p><p>Biodiversity genomics critically depends on correct taxonomic identification of the sample from which data are derived. Tracking of that taxonomic information through systems that archive data and report on genome sequencing efforts. For submission of data to the International Nucleotide Sequence Database Collaboration (INSDC) databases (DNA DataBank of Japan [DDBJ], European Nucleotide Archive [ENA] and National Center for Biotechnology Information [NCBI]), samples and data derived from them must be assigned a species-level NCBI Taxonomy taxonomic identifier (TaxId, sometimes referred to as taxId or txid). We thus need to be able to identify the TaxId for a target species efficiently. Because the NCBI Taxonomy does not include all known species and cannot preemptively represent unknown taxa, we also need an efficient process for generating new TaxIds for species not yet listed. This document provides workflows for different kinds of TaxId acquisition scenarios and was created to guide users in these processes. Although developed for European projects such as Darwin Tree of Life and the European Reference Genome Atlas, the workflows are universally applicable and describe the use of ENA in resolving taxonomic issues. Too Long: Didn't Read (TL;DR): Use the ENA REST API programmatically to retrieve TaxIds for target species and confirm that sequence data can be submitted to those TaxIds.Use the NCBI Web interface to NCBI Taxonomy to identify potential homotypic synonyms.Request a new TaxId from ENA for a species not yet in NCBI Taxonomy, and for species-like entries for which the full Linnaean binomen is not determined (see https://ena-docs.readthedocs.io/en/latest/faq/taxonomy_requests.html#creating-taxon-requests).Discuss directly with the NCBI Taxonomy curators or the curators at ENA and NCBI whenever you think there is an opportunity to improve their database.</p>","PeriodicalId":23677,"journal":{"name":"Wellcome Open Research","volume":"9 ","pages":"591"},"PeriodicalIF":0.0000,"publicationDate":"2024-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11544195/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Wellcome Open Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.12688/wellcomeopenres.22949.1","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"Medicine","Score":null,"Total":0}

引用次数: 0

Abstract

Biodiversity genomics critically depends on correct taxonomic identification of the sample from which data are derived. Tracking of that taxonomic information through systems that archive data and report on genome sequencing efforts. For submission of data to the International Nucleotide Sequence Database Collaboration (INSDC) databases (DNA DataBank of Japan [DDBJ], European Nucleotide Archive [ENA] and National Center for Biotechnology Information [NCBI]), samples and data derived from them must be assigned a species-level NCBI Taxonomy taxonomic identifier (TaxId, sometimes referred to as taxId or txid). We thus need to be able to identify the TaxId for a target species efficiently. Because the NCBI Taxonomy does not include all known species and cannot preemptively represent unknown taxa, we also need an efficient process for generating new TaxIds for species not yet listed. This document provides workflows for different kinds of TaxId acquisition scenarios and was created to guide users in these processes. Although developed for European projects such as Darwin Tree of Life and the European Reference Genome Atlas, the workflows are universally applicable and describe the use of ENA in resolving taxonomic issues. Too Long: Didn't Read (TL;DR): Use the ENA REST API programmatically to retrieve TaxIds for target species and confirm that sequence data can be submitted to those TaxIds.Use the NCBI Web interface to NCBI Taxonomy to identify potential homotypic synonyms.Request a new TaxId from ENA for a species not yet in NCBI Taxonomy, and for species-like entries for which the full Linnaean binomen is not determined (see https://ena-docs.readthedocs.io/en/latest/faq/taxonomy_requests.html#creating-taxon-requests).Discuss directly with the NCBI Taxonomy curators or the curators at ENA and NCBI whenever you think there is an opportunity to improve their database.

查看原文本刊更多论文

生物多样性基因组学分类标识符（TaxId）：向公共数据库提交数据时获取 TaxId 的指南。

生物多样性基因组学在很大程度上取决于对数据来源样本的正确分类鉴定。通过数据归档系统跟踪分类信息，并报告基因组测序工作。要向国际核苷酸序列数据库合作组织（INSDC）数据库（日本 DNA 数据库[DDBJ]、欧洲核苷酸档案馆[ENA]和美国国家生物技术信息中心[NCBI]）提交数据，必须为样本及其衍生数据分配一个物种级别的 NCBI 分类学分类标识符（TaxId，有时也称为 taxId 或 txid）。因此，我们需要能够有效地识别目标物种的 TaxId。由于 NCBI 分类标准并不包括所有已知物种，也不能预先代表未知类群，因此我们还需要一个高效的流程来为尚未列入的物种生成新的 TaxId。本文档为不同类型的 TaxId 获取方案提供了工作流程，旨在为用户在这些流程中提供指导。虽然这些工作流程是为达尔文生命之树和欧洲参考基因组图谱等欧洲项目开发的，但它们普遍适用，并描述了如何使用ENA来解决分类问题。太长：Didn't Read (TL;DR)：使用ENA REST API以编程方式检索目标物种的TaxId，并确认序列数据可以提交给这些TaxIds.使用NCBI分类学的NCBI网络接口识别潜在的同型异名.为NCBI分类学中尚未收录的物种以及尚未确定完整林奈双名的类物种条目向ENA申请新的TaxId（见https://ena-docs.readthedocs.io/en/latest/faq/taxonomy_requests.html#creating-taxon-requests）。每当您认为有机会改进NCBI分类学数据库时，请直接与NCBI分类学馆员或ENA和NCBI的馆员讨论。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Wellcome Open Research Biochemistry, Genetics and Molecular Biology-Biochemistry, Genetics and Molecular Biology (all)

CiteScore

5.50

自引率

0.00%

发文量

426

审稿时长

1 weeks

期刊介绍： Wellcome Open Research publishes scholarly articles reporting any basic scientific, translational and clinical research that has been funded (or co-funded) by Wellcome. Each publication must have at least one author who has been, or still is, a recipient of a Wellcome grant. Articles must be original (not duplications). All research, including clinical trials, systematic reviews, software tools, method articles, and many others, is welcome and will be published irrespective of the perceived level of interest or novelty; confirmatory and negative results, as well as null studies are all suitable. See the full list of article types here. All articles are published using a fully transparent, author-driven model: the authors are solely responsible for the content of their article. Invited peer review takes place openly after publication, and the authors play a crucial role in ensuring that the article is peer-reviewed by independent experts in a timely manner. Articles that pass peer review will be indexed in PubMed and elsewhere. Wellcome Open Research is an Open Research platform: all articles are published open access; the publishing and peer-review processes are fully transparent; and authors are asked to include detailed descriptions of methods and to provide full and easy access to source data underlying the results to improve reproducibility.