Katherine Tattersall, P. Newman, Sachit Rajbhandari, Dave Watts, Mahmoud Sadeghi
{"title":"An Australian Model of Cooperative Data Publishing to OBIS and GBIF","authors":"Katherine Tattersall, P. Newman, Sachit Rajbhandari, Dave Watts, Mahmoud Sadeghi","doi":"10.3897/biss.7.112228","DOIUrl":null,"url":null,"abstract":"The Australian Commonwealth Science and Industrial Research Organisation (CSIRO) hosts both the Australian Ocean Biodiversity Information System (OBIS) and Global Biodiversity Information Facility (GBIF) nodes within the National Collections and Marine Infrastructure (NCMI) business unit. OBIS-AU is led by the NCMI Information and Data Centre and publishes marine biodiversity data in the Darwin Core (DwC) standard via an Integrated Publishing Toolkit (IPT), with over 450 marine datasets at present. The Australian GBIF node is hosted by a separate team at the Atlas of Living Australia (ALA), a national-scale biodiversity analytical and knowledge delivery portal. The ALA aggregates and publishes over 800 terrestrial and marine datasets from a wide variety of research institutes, museums and collections, governments and citizen science agencies, including OBIS-AU. Many OBIS-AU published datasets are harvested and republished by ALA and vice-versa.\n OBIS-AU identifies, performs Quality Control and formats marine biodiversity and observation data, then publishes directly to the OBIS international data repository and portal, using GBIF IPT technology. The ALA data processing pipeline harvests, aggregates and enhances datasets from many sources with authoritative taxonomic and spatial reference data before passing the data on to GBIF. OBIS-AU and ALA are working together to ensure that the publication pathways for any datasets managed by both (with potential for duplication of records and incomplete metadata harvests) are rationalised and that a single collaborative workflow across both units is followed for publication to GBIF. Recently, the data management groups have established an agreement to cooperatively publish marine data and eDNA data. OBIS-AU have commenced publishing datasets directly to GBIF with ALA endorsement.\n We present the convergent evolution of OBIS and GBIF data publishing in Australia, adaptive data workflows to maintain data and metadata integrity, challenges encountered, how domain expertise ensures data quality and the benefits of sharing data skills and code, especially in publishing eDNA data types in DwC (using the DNA-derived data extension) and exploring the new CamTrap Data Package using Frictionless data. We also present the work that both data groups are doing toward adopting the GBIF new Unified Data model for publishing data. This Australian case study demonstrates the strengths of collaborative data publishing and offers a model that minimises replication of data in global aggregators through the development of regional integrated data publishing pipelines.","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"17 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biodiversity Information Science and Standards","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3897/biss.7.112228","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The Australian Commonwealth Science and Industrial Research Organisation (CSIRO) hosts both the Australian Ocean Biodiversity Information System (OBIS) and Global Biodiversity Information Facility (GBIF) nodes within the National Collections and Marine Infrastructure (NCMI) business unit. OBIS-AU is led by the NCMI Information and Data Centre and publishes marine biodiversity data in the Darwin Core (DwC) standard via an Integrated Publishing Toolkit (IPT), with over 450 marine datasets at present. The Australian GBIF node is hosted by a separate team at the Atlas of Living Australia (ALA), a national-scale biodiversity analytical and knowledge delivery portal. The ALA aggregates and publishes over 800 terrestrial and marine datasets from a wide variety of research institutes, museums and collections, governments and citizen science agencies, including OBIS-AU. Many OBIS-AU published datasets are harvested and republished by ALA and vice-versa.
OBIS-AU identifies, performs Quality Control and formats marine biodiversity and observation data, then publishes directly to the OBIS international data repository and portal, using GBIF IPT technology. The ALA data processing pipeline harvests, aggregates and enhances datasets from many sources with authoritative taxonomic and spatial reference data before passing the data on to GBIF. OBIS-AU and ALA are working together to ensure that the publication pathways for any datasets managed by both (with potential for duplication of records and incomplete metadata harvests) are rationalised and that a single collaborative workflow across both units is followed for publication to GBIF. Recently, the data management groups have established an agreement to cooperatively publish marine data and eDNA data. OBIS-AU have commenced publishing datasets directly to GBIF with ALA endorsement.
We present the convergent evolution of OBIS and GBIF data publishing in Australia, adaptive data workflows to maintain data and metadata integrity, challenges encountered, how domain expertise ensures data quality and the benefits of sharing data skills and code, especially in publishing eDNA data types in DwC (using the DNA-derived data extension) and exploring the new CamTrap Data Package using Frictionless data. We also present the work that both data groups are doing toward adopting the GBIF new Unified Data model for publishing data. This Australian case study demonstrates the strengths of collaborative data publishing and offers a model that minimises replication of data in global aggregators through the development of regional integrated data publishing pipelines.