Database: The Journal of Biological Databases and Curation最新文献

筛选
英文 中文
A database on the historical and current occurrences of snakes in Eswatini. 关于史瓦蒂尼蛇的历史和当前事件的数据库。
IF 3.6 4区 生物学
Database: The Journal of Biological Databases and Curation Pub Date : 2025-01-18 DOI: 10.1093/database/baaf040
Ara Monadjem, Richard C Boycott, Thea Litscha-Koen, Adam Kane, Wisdom M Dlamini, Lindelwa Mmema, Katharine L Strutton, Zakhele Hlophe, Sara Padidar
{"title":"A database on the historical and current occurrences of snakes in Eswatini.","authors":"Ara Monadjem, Richard C Boycott, Thea Litscha-Koen, Adam Kane, Wisdom M Dlamini, Lindelwa Mmema, Katharine L Strutton, Zakhele Hlophe, Sara Padidar","doi":"10.1093/database/baaf040","DOIUrl":"10.1093/database/baaf040","url":null,"abstract":"<p><p>Snakes are among the most difficult terrestrial vertebrates to survey, resulting in poor distributional information on most species. This database comprises of 3812 records of 58 species of snakes in 37 genera reported from within the boundaries of Eswatini. The data were compiled from multiple sources including museum specimens, iNaturalist records, literature records, and snake rescue operations. For each specimen reported in the database, we provide the scientific name, latitude and longitude coordinates, and location. Most records also have an associated date. This comprehensive database will be useful to biodiversity experts, conservationists, medical practitioners, researchers, and snake enthusiasts, especially for mapping and modelling snake distributions in the country. To allow easy viewing of the distribution of snakes in the country, we provide an online visualization tool, which should allow a greater number of non-scientists to utilize this database.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":" ","pages":""},"PeriodicalIF":3.6,"publicationDate":"2025-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12462622/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144583337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Microbe Directory: a centralized database for biological interpretation of microbiome data. 微生物目录:对微生物组数据进行生物学解释的集中数据库。
IF 3.6 4区 生物学
Database: The Journal of Biological Databases and Curation Pub Date : 2025-01-18 DOI: 10.1093/database/baaf060
Maria A Sierra, Krista Ryon, Mohith R Arikatla, Radwa Elshafey, Hardik Bhaskar, Jacqueline Proszynski, Chandrima Bhattacharya, Heba Shaaban, David C Danko, Pradeep Ambrose, Sarah A Spaulding, Maria Mercedes Zambrano, The Microbe Directory Consortium, Christopher E Mason
{"title":"The Microbe Directory: a centralized database for biological interpretation of microbiome data.","authors":"Maria A Sierra, Krista Ryon, Mohith R Arikatla, Radwa Elshafey, Hardik Bhaskar, Jacqueline Proszynski, Chandrima Bhattacharya, Heba Shaaban, David C Danko, Pradeep Ambrose, Sarah A Spaulding, Maria Mercedes Zambrano, The Microbe Directory Consortium, Christopher E Mason","doi":"10.1093/database/baaf060","DOIUrl":"10.1093/database/baaf060","url":null,"abstract":"<p><p>The Microbe Directory (TMD) is a centralized database of metadata for microbes from all domains that helps with the biological interpretation of metagenomic data. The database comprises phenotypical and ecological traits of microorganisms, which have been verified by independent manual annotations. This effort has been possible by the help of a community of volunteer students worldwide who were trained in manual curation of microbiology data. To summarize this information, we have built an interactive browser that makes the database accessible to everyone, including non-bioinformaticians. We used the TMD data to analyse microbiome samples from different projects such as MetaSUB, TARA Oceans, Human Microbiome Project, and Sponge Microbiome Project, showcasing the utility of TMD. Furthermore, we compare our microbial annotations with annotations collected by artificial intelligence (AI) and demonstrate that despite the high speed of AI in reviewing and collecting microbial data, annotation requires domain knowledge and therefore manual curation. Collectively, TMD provides a unique source of information that can help to interpret microbiome data and uncover biological associations. Database URL: www.themicrobedirectory.com/.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.6,"publicationDate":"2025-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12462379/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145136810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Integrated data-driven biotechnology research environments. 综合数据驱动的生物技术研究环境。
IF 3.6 4区 生物学
Database: The Journal of Biological Databases and Curation Pub Date : 2025-01-18 DOI: 10.1093/database/baaf064
Rosalia Moreddu
{"title":"Integrated data-driven biotechnology research environments.","authors":"Rosalia Moreddu","doi":"10.1093/database/baaf064","DOIUrl":"10.1093/database/baaf064","url":null,"abstract":"<p><p>In the past few decades, the life sciences have experienced an unprecedented accumulation of data, ranging from genomic sequences and proteomic profiles to heavy-content imaging, clinical assays, and commercial biological products for research. Traditional static databases have been invaluable in providing standardized and structured information. However, they fall short when it comes to facilitating exploratory data interrogation, real-time query, multidimensional comparison, and dynamic visualization. Integrated data-driven research environments aiming at supporting user-driven data queries and visualization offer promising new avenues for making the best use of the vast and heterogeneous data streams collected in biological research. This article discusses the potential of interactive and integrated frameworks, highlighting the importance of implementing this model in biotechnology research, while going through the state-of-the-art in database design, technical choices behind modern data management systems, and emerging needs in multidisciplinary research. Special attention is given to data interrogation strategies, user interface design, and comparative analysis capabilities, along with challenges such as data standardization and scalability in data-heavy applications. Conceptual features for developing interactive data environments along diverse life science domains are then presented in the user case of cell line selection for in vitro research to bridge the gap between research data generation, actionable biological insight, experimental design, and clinical relevance.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.6,"publicationDate":"2025-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12462373/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145136733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Navigating open-source waters: the pharmaceutical industry's role in bioontology development. 在开放源码水域航行:制药工业在生物本体发展中的作用。
IF 3.6 4区 生物学
Database: The Journal of Biological Databases and Curation Pub Date : 2025-01-18 DOI: 10.1093/database/baaf066
Shawn Zheng Kai Tan, Joshua Daniel Valdez, Saritha Vettikunnel Kuriakose
{"title":"Navigating open-source waters: the pharmaceutical industry's role in bioontology development.","authors":"Shawn Zheng Kai Tan, Joshua Daniel Valdez, Saritha Vettikunnel Kuriakose","doi":"10.1093/database/baaf066","DOIUrl":"10.1093/database/baaf066","url":null,"abstract":"<p><p>Bioontologies are core to many data management strategies, artificial intelligence and machine learning initiatives, and search functionality within many pharmaceutical companies. Despite their integral role, many bioontologies, along with their associated tools, are maintained predominantly by academia and their partners, government supported initiatives, and the general community. In this comment, we will dive into some of the reasons behind this trend and argue that there exists a mutual advantage for the life science industry, and pharmaceutical companies in particular, to actively contribute to the advancement of public ontologies and open-source tools. This benefit extends beyond ethical and moral considerations and aligns with strategic interests. Additionally, we will explore practical approaches for contributing, sharing our (Novo Nordisk's research and early development) experience in doing so.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.6,"publicationDate":"2025-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12482910/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145198842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A new database of chestnut DNA fingerprints for genetic diversity assessment, precise varietal identification, and traceability. 建立栗子DNA指纹图谱,用于栗子遗传多样性评估、品种精确鉴定和可追溯性研究。
IF 3.6 4区 生物学
Database: The Journal of Biological Databases and Curation Pub Date : 2025-01-18 DOI: 10.1093/database/baaf056
Ivan Fruggiero, Alessandro Maisto, Sara Passaro, Domenico Gentile, Angelina Nunziata, Nunzio D'Agostino
{"title":"A new database of chestnut DNA fingerprints for genetic diversity assessment, precise varietal identification, and traceability.","authors":"Ivan Fruggiero, Alessandro Maisto, Sara Passaro, Domenico Gentile, Angelina Nunziata, Nunzio D'Agostino","doi":"10.1093/database/baaf056","DOIUrl":"10.1093/database/baaf056","url":null,"abstract":"<p><p>The European chestnut (Castanea sativa Mill., Fagaceae) is ecologically and economically important, particularly in countries like Italy, Greece, Spain, and Turkey, where it supports rural economies and ecosystems. Accurate varietal recognition is crucial for managing chestnut groves but is hindered by the limitations of traditional methods, which require costly expertise and struggle to identify young, dormant, or scion trees. Recent advances in molecular tools, particularly single nucleotide polymorphism (SNP) markers identified through Kompetitive Allele-Specific PCR (KASP) technology, have transformed cultivar identification. To harness this potential, we developed KASTRACKdb, a genetic fingerprinting database for European chestnut that now integrates genotypic and phenotypic data for 150 chestnut accessions. Designed to translate KASP analysis results into practical and actionable insights, KASTRACKdb serves as a powerful tool for cultivar identification and management. The database offers three primary query modes and is designed for continuous upgrades, serving a crucial role in cataloguing the genetic diversity of chestnut trees, characterized by broad geographic distributions and significant genetic variation. This diversity is critical for conservation and breeding programs, enabling precise varietal identification and traceability to protect intellectual property, verify authenticity, and support the commercialization of high-value cultivars. Database URL: KASTRACKdb is available online at https://kastrack.crea.gov.it/kastrackdb/?lang=en.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.6,"publicationDate":"2025-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12462619/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145136649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Biomedical literature-based clinical phenotype definition discovery using large language models. 使用大型语言模型的基于生物医学文献的临床表型定义发现。
IF 3.6 4区 生物学
Database: The Journal of Biological Databases and Curation Pub Date : 2025-01-18 DOI: 10.1093/database/baaf047
Samar Binkheder, Xiaofu Liu, Michael Wu, Lei Wang, Aditi Shendre, Sara K Quinney, Wei-Qi Wei, Lang Li
{"title":"Biomedical literature-based clinical phenotype definition discovery using large language models.","authors":"Samar Binkheder, Xiaofu Liu, Michael Wu, Lei Wang, Aditi Shendre, Sara K Quinney, Wei-Qi Wei, Lang Li","doi":"10.1093/database/baaf047","DOIUrl":"10.1093/database/baaf047","url":null,"abstract":"<p><p>Electronic health record (EHR) phenotyping is a high-demand task because most phenotypes are not usually readily defined. The objective of this study is to develop an effective text-mining approach that automatically extracts clinical phenotype definitions-related sentences from biomedical literature. Abstract-level and full-text sentence-level classifiers were developed for clinical phenotype discovery from PubMed. We compared the performance of the abstract-level classifier on machine learning algorithms: support vector machine (SVM), logistic regression (LR), naïve Bayes, and decision tree. SVM classifier showed the best performance (F-measure = 98%) in identifying clinical phenotype-relevant abstracts. It predicted 459 406 clinical phenotype-related abstracts. For the full-text sentence-level classifier, we compared the performance of SVM, LR, naïve Bayes, decision trees, convolutional neural networks, Bidirectional Encoder Representations from Transformers (BERT), and Bidirectional Encoder Representations from Transformers for Biomedical Text Mining (BioBERT). BioBERT model was the best performer among the full-text sentence-level classifiers (F-measure = 91%). We used these two optimal classifiers for large-scale screening of the PubMed database, starting with abstract retrieval and followed by predicting clinical phenotype-related sentences from full texts. The large-scale screening predicted over two million clinical phenotype-related sentences. Lastly, we developed a knowledgebase using positively predicted sentences, allowing users to query clinical phenotype-related sentences with a phenotype term of interest. The Clinical Phenotype Knowledgebase (CliPheKB) enables users to search for clinical phenotype terms and retrieve sentences related to a specific clinical phenotype of interest (https://cliphekb.shinyapps.io/phenotype-main/). Building upon prior methods, we developed a text mining pipeline to automatically extract clinical phenotype definition-related sentences from the literature. This high-throughput phenotyping approach is generalizable and scalable, and it is complementary to existing EHR phenotyping methods.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.6,"publicationDate":"2025-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12462612/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145136776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PLoV: a comprehensive database of genetic variants leading to pregnancy loss. PLoV:导致流产的基因变异的综合数据库。
IF 3.6 4区 生物学
Database: The Journal of Biological Databases and Curation Pub Date : 2025-01-18 DOI: 10.1093/database/baaf037
Evgeniia M Maksiutenko, Igor V Bezdvornykh, Yury A Barbitoff, Yulia A Nasykhova, Andrey S Glotov
{"title":"PLoV: a comprehensive database of genetic variants leading to pregnancy loss.","authors":"Evgeniia M Maksiutenko, Igor V Bezdvornykh, Yury A Barbitoff, Yulia A Nasykhova, Andrey S Glotov","doi":"10.1093/database/baaf037","DOIUrl":"10.1093/database/baaf037","url":null,"abstract":"<p><p>Pregnancy loss is an important reproductive health problem that affects many couples. Genetic factors play an important role in both spontaneous miscarriage and recurrent pregnancy loss, and the effect of genomic variants is recognized as one of the major causes of pregnancy loss in euploid foetuses. In this work, we extend our previous analysis of the genetic landscape of pregnancy loss and develop a Pregnancy Loss genetic Variant (PLoV) database to aggregate information about mutations that have been implicated in pregnancy loss. The database contains information about 534 genetic variants that have been observed in 421 cases across 47 studies, including foetus-only, parent-only, and trio-based studies. For each case, the database includes a detailed description of the phenotype, including ultrasound data (if provided in the original article). The genetic variants are scattered across all chromosomes in the human genome and affect a total of 292 unique genes. We provide a public access to the PLoV database at https://plovdb.ott.ru/. Database URL: https://plovdb.ott.ru/.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":" ","pages":""},"PeriodicalIF":3.6,"publicationDate":"2025-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12462621/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144583339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
p53motifDB: integration of genomic information and tumour suppressor p53 binding motifs. p53motifDB:整合基因组信息和肿瘤抑制因子p53结合基序。
IF 3.6 4区 生物学
Database: The Journal of Biological Databases and Curation Pub Date : 2025-01-18 DOI: 10.1093/database/baaf053
Gabriele Baniulyte, Sawyer M Hicks, Morgan A Sammons
{"title":"p53motifDB: integration of genomic information and tumour suppressor p53 binding motifs.","authors":"Gabriele Baniulyte, Sawyer M Hicks, Morgan A Sammons","doi":"10.1093/database/baaf053","DOIUrl":"10.1093/database/baaf053","url":null,"abstract":"<p><p>The tumour suppressor gene TP53 encodes the DNA binding transcription factor p53 and is one of the most mutated genes in human cancer. Tumour suppressor activity requires binding of p53 to its DNA response elements and subsequent transcriptional activation of a diverse set of target genes. Despite decades of close study, the logic underlying p53 interactions with its numerous potential genomic binding sites and target genes is not yet fully understood. Here, we present a database of DNA and chromatin-based information focused on putative p53 binding sites in the human genome to allow users to generate and test new hypotheses related to p53 activity in the genome. Users can query genomic locations based on experimentally observed p53 binding, regulatory element activity, genetic variation, evolutionary conservation, chromatin modification state, and chromatin structure. We present multiple use cases demonstrating the utility of this database for generating novel biological hypotheses, such as chromatin-based determinants of p53 binding and potential cell type-specific p53 activity. All database information is also available as a precompiled SQLite database for use in local analysis or as a Shiny web application. Database URL: https://p53motifDB.its.albany.edu.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.6,"publicationDate":"2025-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12462388/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145136787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Characterization and automated classification of sentences in the biomedical literature: a case study for biocuration of gene expression and protein kinase activity. 生物医学文献中句子的表征和自动分类:基因表达和蛋白激酶活性的生物固化案例研究。
IF 3.6 4区 生物学
Database: The Journal of Biological Databases and Curation Pub Date : 2025-01-18 DOI: 10.1093/database/baaf063
Daniela Raciti, Kimberly M Van Auken, Valerio Arnaboldi, Christopher J Tabone, Hans-Michael Muller, Paul W Sternberg
{"title":"Characterization and automated classification of sentences in the biomedical literature: a case study for biocuration of gene expression and protein kinase activity.","authors":"Daniela Raciti, Kimberly M Van Auken, Valerio Arnaboldi, Christopher J Tabone, Hans-Michael Muller, Paul W Sternberg","doi":"10.1093/database/baaf063","DOIUrl":"10.1093/database/baaf063","url":null,"abstract":"<p><p>Biological knowledgebases are essential resources for biomedical researchers, providing ready access to gene function and genomic data. Professional, manual curation of knowledgebases, however, is labour-intensive and thus high-performing machine learning (ML) methods that improve biocuration efficiency are needed. Here, we report on sentence-level classification to identify biocuration-relevant sentences in the full text of published references for two gene function data types: gene expression and protein kinase activity. We performed a detailed characterization of sentences from references in the WormBase bibliography and used this characterization to define three tasks for classifying sentences as either (i) fully curatable, (ii) fully and partially curatable, or (iii) all language-related. We evaluated various ML models applied to these tasks and found that GPT and BioBERT achieve the highest average performance, resulting in F1 performance scores ranging from 0.89 to 0.99 depending upon the task. Moreover, our inter-annotator agreement analyses and curator timing exercises demonstrated that curators readily converged on classification of high-quality training sentences that take a relatively short period of time to collect, making expansion of this approach to other data types a realistic addition to existing biocuration workflows. Our findings demonstrate the feasibility of extracting biocuration-relevant sentences from full text. Integrating these models into professional biocuration workflows, such as those used by the Alliance of Genome Resources and the ACKnowledge community curation platform, might well facilitate efficient and accurate annotation of the biomedical literature.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.6,"publicationDate":"2025-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12482909/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145198759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A novel taxonomic database for eukaryotic mitochondrial cytochrome oxidase subunit I gene (eKOI), with a focus on protists diversity. 一个新的真核线粒体细胞色素氧化酶亚基I基因(eKOI)的分类数据库,重点关注原生生物的多样性。
IF 3.6 4区 生物学
Database: The Journal of Biological Databases and Curation Pub Date : 2025-01-18 DOI: 10.1093/database/baaf057
Rubén González-Miguéns, Àlex Gàlvez-Morante, Margarita Skamnelou, Meritxell Antó, Elena Casacuberta, Daniel J Richter, Enrique Lara, Daniel Vaulot, Javier Del Campo, Iñaki Ruiz-Trillo
{"title":"A novel taxonomic database for eukaryotic mitochondrial cytochrome oxidase subunit I gene (eKOI), with a focus on protists diversity.","authors":"Rubén González-Miguéns, Àlex Gàlvez-Morante, Margarita Skamnelou, Meritxell Antó, Elena Casacuberta, Daniel J Richter, Enrique Lara, Daniel Vaulot, Javier Del Campo, Iñaki Ruiz-Trillo","doi":"10.1093/database/baaf057","DOIUrl":"10.1093/database/baaf057","url":null,"abstract":"<p><p>Metabarcoding has emerged as a robust method for assessing biodiversity patterns by retrieving environmental DNA directly from ecosystems. While the 18S rRNA gene is the primary genetic marker used for broad eukaryotic metabarcoding, it has limitations in resolving lower taxonomic levels. A potential alternative is the mitochondrial cytochrome oxidase subunit I (COI) gene because it offers resolution at the species level. However, the COI gene lacks a comprehensive, curated taxonomically informed database including protists. To address this gap, we introduce eKOI, a novel, curated COI gene database designed to enhance the taxonomic annotation for protists that can be used for COI-based metabarcoding. eKOI integrates data from GenBank and mitochondrial genomes, followed by extensive manual curation to eliminate redundancies and contaminants, recovering 15 947 sequences within 80 eukaryotic phyla. We validated the use of eKOI by reannotating several COI metabarcoding datasets, revealing previously unidentified protist biodiversity and demonstrating the database utility for community-level analyses.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.6,"publicationDate":"2025-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12462617/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145136613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信