{"title":"Elevating the Fitness of Use of GBIF Occurrence Datasets: A proposal for peer review","authors":"Vijay Barve","doi":"10.3897/biss.7.112237","DOIUrl":null,"url":null,"abstract":"Biodiversity data plays a pivotal role in understanding and conserving our natural world. As the largest occurrence data aggregator, the Global Biodiversity Information Facility (GBIF) serves as a valuable platform for researchers and practitioners to access and analyze biodiversity information from across the globe (Ball-Damerow et al. 2019). However, ensuring the quality of GBIF datasets remains a critical challenge (Chapman 2005).\n The community emphasizes the importance of data quality and its direct impact on the fitness of use for biodiversity research and conservation efforts (Chapman et al. 2020). While GBIF continues to grow in terms of the quantity of data it provides, the quality of these datasets varies significantly (Zizka et al. 2020). The biodiversity informatics community has been working diligently to ensure data quality at every step of data creation, curation, publication (Waller et al. 2021), and end-use (Gueta et al. 2019) by employing automated tools and flagging systems to identify and address issues. However, there is still more work to be done to effectively address data quality problems and enhance the fitness of use for GBIF-mediated data.\n I highlight a missing component in GBIF's data publication process: the absence of formal peer reviews. Despite GBIF encompassing the essential elements of a data paper, including detailed metadata, data accessibility, and robust data citation mechanisms, the lack of peer review hinders the credibility and reliability of the datasets mobilized through GBIF.\n To bridge this gap, I propose the implementation of a comprehensive peer review system within GBIF. Peer reviews would involve subjecting GBIF datasets to rigorous evaluation by domain experts and data scientists, ensuring the accuracy, completeness, and consistency of the data. This process would enhance the trustworthiness and usability of datasets, enabling researchers and policymakers to make informed decisions based on reliable biodiversity information.\n Furthermore, the establishment of a peer review system within GBIF would foster collaboration and knowledge exchange among the biodiversity community, as experts provide constructive feedback to dataset authors. This iterative process would not only improve data quality but also encourage data contributors to adhere to best practices, thereby elevating the overall standards of biodiversity data mobilization through GBIF.","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"57 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biodiversity Information Science and Standards","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3897/biss.7.112237","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Biodiversity data plays a pivotal role in understanding and conserving our natural world. As the largest occurrence data aggregator, the Global Biodiversity Information Facility (GBIF) serves as a valuable platform for researchers and practitioners to access and analyze biodiversity information from across the globe (Ball-Damerow et al. 2019). However, ensuring the quality of GBIF datasets remains a critical challenge (Chapman 2005).
The community emphasizes the importance of data quality and its direct impact on the fitness of use for biodiversity research and conservation efforts (Chapman et al. 2020). While GBIF continues to grow in terms of the quantity of data it provides, the quality of these datasets varies significantly (Zizka et al. 2020). The biodiversity informatics community has been working diligently to ensure data quality at every step of data creation, curation, publication (Waller et al. 2021), and end-use (Gueta et al. 2019) by employing automated tools and flagging systems to identify and address issues. However, there is still more work to be done to effectively address data quality problems and enhance the fitness of use for GBIF-mediated data.
I highlight a missing component in GBIF's data publication process: the absence of formal peer reviews. Despite GBIF encompassing the essential elements of a data paper, including detailed metadata, data accessibility, and robust data citation mechanisms, the lack of peer review hinders the credibility and reliability of the datasets mobilized through GBIF.
To bridge this gap, I propose the implementation of a comprehensive peer review system within GBIF. Peer reviews would involve subjecting GBIF datasets to rigorous evaluation by domain experts and data scientists, ensuring the accuracy, completeness, and consistency of the data. This process would enhance the trustworthiness and usability of datasets, enabling researchers and policymakers to make informed decisions based on reliable biodiversity information.
Furthermore, the establishment of a peer review system within GBIF would foster collaboration and knowledge exchange among the biodiversity community, as experts provide constructive feedback to dataset authors. This iterative process would not only improve data quality but also encourage data contributors to adhere to best practices, thereby elevating the overall standards of biodiversity data mobilization through GBIF.
生物多样性数据在理解和保护我们的自然世界中起着关键作用。作为最大的发生数据聚合器,全球生物多样性信息设施(GBIF)为研究人员和从业者获取和分析全球生物多样性信息提供了一个有价值的平台(Ball-Damerow et al. 2019)。然而,确保GBIF数据集的质量仍然是一个关键的挑战(Chapman 2005)。科学界强调数据质量的重要性及其对生物多样性研究和保护工作的适用性的直接影响(Chapman et al. 2020)。虽然GBIF在提供的数据量方面继续增长,但这些数据集的质量差异很大(Zizka et al. 2020)。生物多样性信息学社区一直在努力确保数据创建、管理、出版(Waller等人,2021)和最终使用(Gueta等人,2019)的每一步的数据质量,方法是采用自动化工具和标记系统来识别和解决问题。然而,要有效地解决数据质量问题,提高gbif介导数据的使用适应性,还有更多的工作要做。我强调GBIF数据发布过程中缺失的一个组成部分:缺乏正式的同行评审。尽管GBIF包含了数据论文的基本要素,包括详细的元数据、数据可访问性和强大的数据引用机制,但缺乏同行评审阻碍了通过GBIF动员的数据集的可信度和可靠性。为了弥补这一差距,我建议在GBIF内实施全面的同行评议制度。同行评审将涉及让GBIF数据集接受领域专家和数据科学家的严格评估,以确保数据的准确性、完整性和一致性。这一过程将提高数据集的可信度和可用性,使研究人员和决策者能够根据可靠的生物多样性信息做出明智的决策。此外,在GBIF内建立同行评审系统将促进生物多样性界之间的合作和知识交流,因为专家可以向数据集作者提供建设性的反馈。这一迭代过程不仅可以提高数据质量,还可以鼓励数据提供者遵循最佳做法,从而通过GBIF提高生物多样性数据动员的总体标准。