Luan Rabelo , Davidson Sodré , Oscar David Albito Balcázar , Murilo Furtado do Rosário , Aurycéia Jaquelyne Guimarães-Costa , Grazielle Gomes , Iracilda Sampaio , Marcelo Vallinoto
{"title":"dataFishing: An efficient Python tool and user-friendly web-form for mining mitochondrial and chloroplast sequences, taxonomic, and biodiversity data","authors":"Luan Rabelo , Davidson Sodré , Oscar David Albito Balcázar , Murilo Furtado do Rosário , Aurycéia Jaquelyne Guimarães-Costa , Grazielle Gomes , Iracilda Sampaio , Marcelo Vallinoto","doi":"10.1016/j.ecoinf.2024.102970","DOIUrl":null,"url":null,"abstract":"<div><div>NCBI GenBank and BOLD Systems are important databases for biodiversity research, in which the deposited data can be used for various purposes, such as species identification analysis, evolutionary studies, biodiversity monitoring, as well as assessing the effects of possible climate changes on species distributions. Other information, such as taxonomy, collection site locations, and conservation status, is often critical for these studies. Some databases, such as GBIF, BOLD Systems, and GenBank, provide data on the taxonomy, habitat, and geographic distribution of various taxonomic groups, while others, such as WoRMS and IUCN, have specific data on marine species and conservation status. However, depending on the taxonomic group studied, searches in these databases can encompass dozens or hundreds of queries, forcing researchers to conduct extensive searches in each database, which is a time-consuming and error-prone process. To facilitate and automate access to this information, we introduce dataFishing, a Python script and a web form. dataFishing is faster and more efficient than other R packages, such as <em>bold</em>, <em>taxize</em>, <em>rgbif</em>, <em>rredlist</em>, and <em>worrms</em>, for obtaining taxonomic information from the consulted databases. Moreover, it allows the retrieval of DNA sequences, common names, synonyms, conservation status, and species occurrence points. This tool is free and will enable a more systematized and time-efficient search, which tends to facilitate such data inquiries.</div></div>","PeriodicalId":51024,"journal":{"name":"Ecological Informatics","volume":"85 ","pages":"Article 102970"},"PeriodicalIF":5.8000,"publicationDate":"2024-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ecological Informatics","FirstCategoryId":"93","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1574954124005120","RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ECOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
NCBI GenBank and BOLD Systems are important databases for biodiversity research, in which the deposited data can be used for various purposes, such as species identification analysis, evolutionary studies, biodiversity monitoring, as well as assessing the effects of possible climate changes on species distributions. Other information, such as taxonomy, collection site locations, and conservation status, is often critical for these studies. Some databases, such as GBIF, BOLD Systems, and GenBank, provide data on the taxonomy, habitat, and geographic distribution of various taxonomic groups, while others, such as WoRMS and IUCN, have specific data on marine species and conservation status. However, depending on the taxonomic group studied, searches in these databases can encompass dozens or hundreds of queries, forcing researchers to conduct extensive searches in each database, which is a time-consuming and error-prone process. To facilitate and automate access to this information, we introduce dataFishing, a Python script and a web form. dataFishing is faster and more efficient than other R packages, such as bold, taxize, rgbif, rredlist, and worrms, for obtaining taxonomic information from the consulted databases. Moreover, it allows the retrieval of DNA sequences, common names, synonyms, conservation status, and species occurrence points. This tool is free and will enable a more systematized and time-efficient search, which tends to facilitate such data inquiries.
期刊介绍:
The journal Ecological Informatics is devoted to the publication of high quality, peer-reviewed articles on all aspects of computational ecology, data science and biogeography. The scope of the journal takes into account the data-intensive nature of ecology, the growing capacity of information technology to access, harness and leverage complex data as well as the critical need for informing sustainable management in view of global environmental and climate change.
The nature of the journal is interdisciplinary at the crossover between ecology and informatics. It focuses on novel concepts and techniques for image- and genome-based monitoring and interpretation, sensor- and multimedia-based data acquisition, internet-based data archiving and sharing, data assimilation, modelling and prediction of ecological data.