Carla Novoa Sepúlveda, Stephan Biebl, Nadja Pöllath, S. Seifert, Markus Weiss, Tanja Weibulat, Dagmar Triebel
{"title":"符合gbif标准的全球自然馆藏害虫分类参考目录管理和出版数据管道","authors":"Carla Novoa Sepúlveda, Stephan Biebl, Nadja Pöllath, S. Seifert, Markus Weiss, Tanja Weibulat, Dagmar Triebel","doi":"10.3897/biss.7.112391","DOIUrl":null,"url":null,"abstract":"There is a growing demand for monitoring pests in natural history collections (NHCs) and establishing integrated pest management (IPM) solutions (Crossman and Ryde 2022). In this context, up-to-date taxonomic reference lists and controlled vocabularies following standard schemes are crucial and facilitate recording organisms detected in collections.\n The data pipeline described here results in the publication of a taxon reference list based on information from online resources and standard IPM literature. Most of the over 140 pest taxa on species level and above are insects, the rest belong to other animal groups and fungi.\n The complete taxon names, synonyms, English and German common names, and the hierarchical classification (parent-child relationships) are organised in a client-server installation of DiversityTaxonNames (DTN) at the Bavarian Natural History Collections (SNSB). DTN is a Microsoft Structured Query Language (MS SQL) database tool of the Diversity Workbench (DWB) framework with a published Entity Relation (ER) diagram (Hagedorn et al. 2019). The management is done using the Global Biodiversity Information Facility (GBIF) backbone taxonomy as external name resource, with linkage to the respective Wikidata Q item ID as a external persistent identifier (PID). Moreover, information on pest occurrence in NHCs is given, distinguishing the Consortium of European Taxonomic Facilities (CETAF) major NHC collection types affected (i.e., heritage sciences, life sciences and earth sciences) and the object categories, e.g., natural objects/specimens damaged. The data management in DTN enables the long-running curation, done by list curators.\n The generic data pipeline for the management and publication of a Global Taxonomic Reference List of Pests in NHCs is based on the DTN taxon lists concept and architecture and described under About \"Taxon list of pest organisms for IPM at natural history collections compiled at the SNSB\". It includes four steps (A–D) with significant results for best practices of data processing (Fig. 1).\n A. The data is managed and processed for publication by list curators in the database DiversityTaxonNames (DTN).\n As a result, the list can be kept up-to-date and is—without transformation—ready to be used for IPM solutions at any NHC with a DiversityCollection installation and as part of the DWB cloud services.\n B. The up-to-date data is publicly available via the DTN REST Webservice for Taxon Lists with machine-readable Application Programming Interface (API).\n As a result, the dynamic list publication service can be used as a reference backbone for establishing IPM solutions for pest monitoring at any NHC.\n C. The data is provided via the GBIF checklist data publication pipeline of the SNSB through GBIF validation tools and Darwin Core Archive in DwC-A (zip format) for GBIF.\n As a result, the checklist information becomes part of the GBIF network with GBIF ChecklistBank and GBIF Global Taxonomy. This ensures future compliance of data with the Findability, Accessibility, Interoperability, and Reuse (FAIR) guiding principles.\n D. The DTN REST Web service for Taxon Lists (currently 60 lists) is registered and accessible through the German Federation for Biological Data (GFBio) Terminology service.\n As a result, the lists with external PIDs and other information are available as a service (see DTN lists overview). In the upcoming Research Data Commons of the German National Research Data Infrastructure (NFDI) Initiative (Diepenbroek et al. 2021), it will be part of a standardized layer of APIs with an agreed interface scheme for improved accessibility.\n The provided tools, API and data are part of the upcoming NFDI4Biodiversity service portfolio. Future scenarios include the use of the list items and properties as classes for diagnosis purposes with DiversityNaviKey (Triebel et al. 2021) including the publication of images for identifying pests.","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"22 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"GBIF-Compliant Data Pipeline for the Management and Publication of a Global Taxonomic Reference List of Pests in Natural History Collections\",\"authors\":\"Carla Novoa Sepúlveda, Stephan Biebl, Nadja Pöllath, S. Seifert, Markus Weiss, Tanja Weibulat, Dagmar Triebel\",\"doi\":\"10.3897/biss.7.112391\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"There is a growing demand for monitoring pests in natural history collections (NHCs) and establishing integrated pest management (IPM) solutions (Crossman and Ryde 2022). In this context, up-to-date taxonomic reference lists and controlled vocabularies following standard schemes are crucial and facilitate recording organisms detected in collections.\\n The data pipeline described here results in the publication of a taxon reference list based on information from online resources and standard IPM literature. Most of the over 140 pest taxa on species level and above are insects, the rest belong to other animal groups and fungi.\\n The complete taxon names, synonyms, English and German common names, and the hierarchical classification (parent-child relationships) are organised in a client-server installation of DiversityTaxonNames (DTN) at the Bavarian Natural History Collections (SNSB). DTN is a Microsoft Structured Query Language (MS SQL) database tool of the Diversity Workbench (DWB) framework with a published Entity Relation (ER) diagram (Hagedorn et al. 2019). The management is done using the Global Biodiversity Information Facility (GBIF) backbone taxonomy as external name resource, with linkage to the respective Wikidata Q item ID as a external persistent identifier (PID). Moreover, information on pest occurrence in NHCs is given, distinguishing the Consortium of European Taxonomic Facilities (CETAF) major NHC collection types affected (i.e., heritage sciences, life sciences and earth sciences) and the object categories, e.g., natural objects/specimens damaged. The data management in DTN enables the long-running curation, done by list curators.\\n The generic data pipeline for the management and publication of a Global Taxonomic Reference List of Pests in NHCs is based on the DTN taxon lists concept and architecture and described under About \\\"Taxon list of pest organisms for IPM at natural history collections compiled at the SNSB\\\". It includes four steps (A–D) with significant results for best practices of data processing (Fig. 1).\\n A. The data is managed and processed for publication by list curators in the database DiversityTaxonNames (DTN).\\n As a result, the list can be kept up-to-date and is—without transformation—ready to be used for IPM solutions at any NHC with a DiversityCollection installation and as part of the DWB cloud services.\\n B. The up-to-date data is publicly available via the DTN REST Webservice for Taxon Lists with machine-readable Application Programming Interface (API).\\n As a result, the dynamic list publication service can be used as a reference backbone for establishing IPM solutions for pest monitoring at any NHC.\\n C. The data is provided via the GBIF checklist data publication pipeline of the SNSB through GBIF validation tools and Darwin Core Archive in DwC-A (zip format) for GBIF.\\n As a result, the checklist information becomes part of the GBIF network with GBIF ChecklistBank and GBIF Global Taxonomy. This ensures future compliance of data with the Findability, Accessibility, Interoperability, and Reuse (FAIR) guiding principles.\\n D. The DTN REST Web service for Taxon Lists (currently 60 lists) is registered and accessible through the German Federation for Biological Data (GFBio) Terminology service.\\n As a result, the lists with external PIDs and other information are available as a service (see DTN lists overview). In the upcoming Research Data Commons of the German National Research Data Infrastructure (NFDI) Initiative (Diepenbroek et al. 2021), it will be part of a standardized layer of APIs with an agreed interface scheme for improved accessibility.\\n The provided tools, API and data are part of the upcoming NFDI4Biodiversity service portfolio. Future scenarios include the use of the list items and properties as classes for diagnosis purposes with DiversityNaviKey (Triebel et al. 2021) including the publication of images for identifying pests.\",\"PeriodicalId\":9011,\"journal\":{\"name\":\"Biodiversity Information Science and Standards\",\"volume\":\"22 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-09-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Biodiversity Information Science and Standards\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3897/biss.7.112391\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biodiversity Information Science and Standards","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3897/biss.7.112391","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
对自然历史藏品(NHCs)中有害生物监测和建立综合有害生物管理(IPM)解决方案的需求日益增长(Crossman和Ryde 2022)。在这种情况下,最新的分类参考表和遵循标准方案的受控词汇表至关重要,并有助于记录收集中检测到的生物。这里描述的数据管道导致基于在线资源和标准IPM文献信息的分类单元参考列表的发布。在种级及以上的140多个害虫分类群中,大多数是昆虫,其余属于其他动物群和真菌。完整的分类单元名称、同义词、英语和德语常用名称以及层次分类(父子关系)在巴伐利亚自然历史收藏(SNSB)的DiversityTaxonNames (DTN)的客户机-服务器安装中组织。DTN是多样性工作台(DWB)框架的Microsoft结构化查询语言(MS SQL)数据库工具,具有已发布的实体关系(ER)图(Hagedorn et al. 2019)。管理使用全球生物多样性信息设施(GBIF)主干分类法作为外部名称资源,并链接到相应的Wikidata Q项目ID作为外部持久标识符(PID)。此外,还提供了国家卫生中心有害生物发生情况的信息,区分了欧洲分类设施联盟(CETAF)受影响的主要国家卫生中心收集类型(即遗产科学、生命科学和地球科学)和对象类别,例如自然物体/标本受损。DTN中的数据管理支持长期运行的管理,由列表管理器完成。国家卫生健康中心管理和出版《全球有害生物分类参考清单》的通用数据管道基于DTN分类单元清单的概念和架构,并在关于“SNSB编制的自然历史馆藏IPM有害生物分类单元清单”中进行了描述。它包括四个步骤(A-D),对于数据处理的最佳实践具有重要的结果(图1)。A.数据由数据库DiversityTaxonNames (DTN)中的列表管理员管理和处理以供发布。因此,该列表可以保持最新状态,并且无需进行转换,即可用于安装了DiversityCollection的任何NHC的IPM解决方案,并可作为DWB云服务的一部分。B.最新的数据通过DTN REST Webservice公开提供,具有机器可读的应用程序编程接口(API)。因此,动态列表发布服务可作为任何国家卫生健康中心建立有害生物监测IPM解决方案的参考骨干。C.通过GBIF验证工具和GBIF DwC-A (zip格式)的达尔文核心档案,通过SNSB的GBIF核对表数据发布管道提供数据。因此,清单信息通过GBIF ChecklistBank和GBIF Global Taxonomy成为GBIF网络的一部分。这确保了数据将来符合可查找性、可访问性、互操作性和重用(FAIR)指导原则。D.分类单元列表的DTN REST Web服务(目前有60个列表)是通过德国生物数据联合会(GFBio)术语服务注册和访问的。因此,带有外部pid和其他信息的列表可以作为服务使用(请参阅DTN列表概述)。在即将到来的德国国家研究数据基础设施(NFDI)计划的研究数据共享中(Diepenbroek等人,2021年),它将成为具有改进可访问性的商定接口方案的标准化api层的一部分。所提供的工具、API和数据是即将推出的nfdi4生物多样性服务组合的一部分。未来的场景包括使用DiversityNaviKey (triiebel et al. 2021)将列表项和属性作为分类用于诊断目的,包括发布用于识别害虫的图像。
GBIF-Compliant Data Pipeline for the Management and Publication of a Global Taxonomic Reference List of Pests in Natural History Collections
There is a growing demand for monitoring pests in natural history collections (NHCs) and establishing integrated pest management (IPM) solutions (Crossman and Ryde 2022). In this context, up-to-date taxonomic reference lists and controlled vocabularies following standard schemes are crucial and facilitate recording organisms detected in collections.
The data pipeline described here results in the publication of a taxon reference list based on information from online resources and standard IPM literature. Most of the over 140 pest taxa on species level and above are insects, the rest belong to other animal groups and fungi.
The complete taxon names, synonyms, English and German common names, and the hierarchical classification (parent-child relationships) are organised in a client-server installation of DiversityTaxonNames (DTN) at the Bavarian Natural History Collections (SNSB). DTN is a Microsoft Structured Query Language (MS SQL) database tool of the Diversity Workbench (DWB) framework with a published Entity Relation (ER) diagram (Hagedorn et al. 2019). The management is done using the Global Biodiversity Information Facility (GBIF) backbone taxonomy as external name resource, with linkage to the respective Wikidata Q item ID as a external persistent identifier (PID). Moreover, information on pest occurrence in NHCs is given, distinguishing the Consortium of European Taxonomic Facilities (CETAF) major NHC collection types affected (i.e., heritage sciences, life sciences and earth sciences) and the object categories, e.g., natural objects/specimens damaged. The data management in DTN enables the long-running curation, done by list curators.
The generic data pipeline for the management and publication of a Global Taxonomic Reference List of Pests in NHCs is based on the DTN taxon lists concept and architecture and described under About "Taxon list of pest organisms for IPM at natural history collections compiled at the SNSB". It includes four steps (A–D) with significant results for best practices of data processing (Fig. 1).
A. The data is managed and processed for publication by list curators in the database DiversityTaxonNames (DTN).
As a result, the list can be kept up-to-date and is—without transformation—ready to be used for IPM solutions at any NHC with a DiversityCollection installation and as part of the DWB cloud services.
B. The up-to-date data is publicly available via the DTN REST Webservice for Taxon Lists with machine-readable Application Programming Interface (API).
As a result, the dynamic list publication service can be used as a reference backbone for establishing IPM solutions for pest monitoring at any NHC.
C. The data is provided via the GBIF checklist data publication pipeline of the SNSB through GBIF validation tools and Darwin Core Archive in DwC-A (zip format) for GBIF.
As a result, the checklist information becomes part of the GBIF network with GBIF ChecklistBank and GBIF Global Taxonomy. This ensures future compliance of data with the Findability, Accessibility, Interoperability, and Reuse (FAIR) guiding principles.
D. The DTN REST Web service for Taxon Lists (currently 60 lists) is registered and accessible through the German Federation for Biological Data (GFBio) Terminology service.
As a result, the lists with external PIDs and other information are available as a service (see DTN lists overview). In the upcoming Research Data Commons of the German National Research Data Infrastructure (NFDI) Initiative (Diepenbroek et al. 2021), it will be part of a standardized layer of APIs with an agreed interface scheme for improved accessibility.
The provided tools, API and data are part of the upcoming NFDI4Biodiversity service portfolio. Future scenarios include the use of the list items and properties as classes for diagnosis purposes with DiversityNaviKey (Triebel et al. 2021) including the publication of images for identifying pests.