{"title":"The Role of the OLS Program in the Development of echinopscis (an Extensible Notebook for Open Science on Specimens)","authors":"Nicky Nicolson, Eve Lucas","doi":"10.3897/biss.7.112318","DOIUrl":"https://doi.org/10.3897/biss.7.112318","url":null,"abstract":"Starting in early 2022, biodiversity informatics researchers at Kew have been developing echinopscis: an \"extensible notebook for open science on specimens\". This aims to build on the early experiments that our community conducted with \"e-taxonomy\": the development of tools and techniques to enable taxonomic research to be conducted online. Early e-taxonomic tools (e.g., Scratchpads Smith et al. 2011) had to perform a wide range of functions, but in the past decade or so the move towards open science has built better support for generic functionality, such as reference management (Zotero) and document production (pandoc), skills development in automation and revision control to support reproducible science, as documented by the Turing Way (The Turing Way Community 2022), and an awareness of the importance of community building. We have developed echinopscis at Kew via a cross-departmental collaboration between researchers in biodiversity informatics and accelerated taxonomy. We have also benefitted from valuable input and advice from our many colleagues in associated projects and organisations around the world. \u0000 OLS (originally Open Life Sciences) is a training and mentoring program for Open Science leaders with a focus on community building. The name was recently (2023) made more generic—\"Open Seeds\"—whilst retaining their well-known acronym \"OLS\"*1. OLS is a 16-week cohort-based mentoring program. Participants apply to join a cohort with a project that is developed through the 16 weeks. Each week of the syllabus alternates between time with a dedicated Open Science mentor and cohort calls, which are used to develop skills in project design, community building, open development & licencing, and inclusivity. Over 500 practitioners, experts and learners have participated across the seven completed cohorts of OLS' Open Seeds training and mentoring. Through this programme, over 300 researchers and open leaders from across six continents have designed, lauched and supported 200 projects from different disciplines worldwide. The next cohort will run between September 2023 and January 2024, and will be the eighth iteration of the program. \u0000 This talk will briefly outline the work that we have done to setup and experiment with echinopscis, but will focus on the impact that the OLS program has had in its development. We will also include the use of techniques learned through OLS in other biodiversity informatics projects. OLS acknowledges that their program receives relatively few applications from project leads in biodiversity and we hope that this talk will be informative for Biodiversity Information Standards (TDWG) participants and can be used to build productive links between these communities.","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"79 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77794142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Carla Novoa Sepúlveda, Stephan Biebl, Nadja Pöllath, S. Seifert, Markus Weiss, Tanja Weibulat, Dagmar Triebel
{"title":"GBIF-Compliant Data Pipeline for the Management and Publication of a Global Taxonomic Reference List of Pests in Natural History Collections","authors":"Carla Novoa Sepúlveda, Stephan Biebl, Nadja Pöllath, S. Seifert, Markus Weiss, Tanja Weibulat, Dagmar Triebel","doi":"10.3897/biss.7.112391","DOIUrl":"https://doi.org/10.3897/biss.7.112391","url":null,"abstract":"There is a growing demand for monitoring pests in natural history collections (NHCs) and establishing integrated pest management (IPM) solutions (Crossman and Ryde 2022). In this context, up-to-date taxonomic reference lists and controlled vocabularies following standard schemes are crucial and facilitate recording organisms detected in collections.\u0000 The data pipeline described here results in the publication of a taxon reference list based on information from online resources and standard IPM literature. Most of the over 140 pest taxa on species level and above are insects, the rest belong to other animal groups and fungi.\u0000 The complete taxon names, synonyms, English and German common names, and the hierarchical classification (parent-child relationships) are organised in a client-server installation of DiversityTaxonNames (DTN) at the Bavarian Natural History Collections (SNSB). DTN is a Microsoft Structured Query Language (MS SQL) database tool of the Diversity Workbench (DWB) framework with a published Entity Relation (ER) diagram (Hagedorn et al. 2019). The management is done using the Global Biodiversity Information Facility (GBIF) backbone taxonomy as external name resource, with linkage to the respective Wikidata Q item ID as a external persistent identifier (PID). Moreover, information on pest occurrence in NHCs is given, distinguishing the Consortium of European Taxonomic Facilities (CETAF) major NHC collection types affected (i.e., heritage sciences, life sciences and earth sciences) and the object categories, e.g., natural objects/specimens damaged. The data management in DTN enables the long-running curation, done by list curators.\u0000 The generic data pipeline for the management and publication of a Global Taxonomic Reference List of Pests in NHCs is based on the DTN taxon lists concept and architecture and described under About \"Taxon list of pest organisms for IPM at natural history collections compiled at the SNSB\". It includes four steps (A–D) with significant results for best practices of data processing (Fig. 1).\u0000 A. The data is managed and processed for publication by list curators in the database DiversityTaxonNames (DTN).\u0000 As a result, the list can be kept up-to-date and is—without transformation—ready to be used for IPM solutions at any NHC with a DiversityCollection installation and as part of the DWB cloud services.\u0000 B. The up-to-date data is publicly available via the DTN REST Webservice for Taxon Lists with machine-readable Application Programming Interface (API).\u0000 As a result, the dynamic list publication service can be used as a reference backbone for establishing IPM solutions for pest monitoring at any NHC.\u0000 C. The data is provided via the GBIF checklist data publication pipeline of the SNSB through GBIF validation tools and Darwin Core Archive in DwC-A (zip format) for GBIF.\u0000 As a result, the checklist information becomes part of the GBIF network with GBIF ChecklistBank and GBIF Global Taxonomy. This ensures future c","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"22 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84719431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Elizabeth R. Ellwood, Wouter Addink, John Bates, Andrew Bentley, Jutta Buschbom, Alina Freire-Fierro, Jose Fortes, David Jennings, Kerstin Lehnert, Bertram Ludäscher, Keping Ma, James Macklin, Austin Mast, Joe Miller, Gil Nelson, Nicky Nicolson, Jyotsna Pandey, Deborah Paul, Sinlan Poo, Richard Rabeler, Pamela S. Soltis, Elycia Wallis, Michael Webster, Andrew Young, Breda Zimkus
{"title":"Connecting the Dots: Aligning human capacity through networks toward a globally interoperable Digital Extended Specimen (DES) infrastructure","authors":"Elizabeth R. Ellwood, Wouter Addink, John Bates, Andrew Bentley, Jutta Buschbom, Alina Freire-Fierro, Jose Fortes, David Jennings, Kerstin Lehnert, Bertram Ludäscher, Keping Ma, James Macklin, Austin Mast, Joe Miller, Gil Nelson, Nicky Nicolson, Jyotsna Pandey, Deborah Paul, Sinlan Poo, Richard Rabeler, Pamela S. Soltis, Elycia Wallis, Michael Webster, Andrew Young, Breda Zimkus","doi":"10.3897/biss.7.112390","DOIUrl":"https://doi.org/10.3897/biss.7.112390","url":null,"abstract":"Thanks to substantial support for biodiversity data mobilization in recent decades, billions of occurrence records are openly available, documenting life on Earth and enabling timely research, awareness raising, and policy-making. Initiatives across local to global scales have been separately funded to serve different, yet often overlapping audiences of data users, and have developed a variety of platforms and infrastructures to meet the needs of these audiences. The independent progress of biodiversity data providers has led to innovations as well as challenges for the community at large as we move towards connecting and linking a diversity of information from disparate sources as Digital Extended Specimens (DES). Recognizing a need for deeper and more frequent opportunities for communication and collaboration across the globe, an ad-hoc group of representatives of various international, national, and regional organizations have been meeting virtually since 2020 to provide a forum for updates, announcements, and shared progress. This group is provisionally named International Partners for the Digital Extended Specimen (IPDES), and is guided by these four concepts: Biodiversity, Connection, Knowledge and Agency. Participants in IPDES include representatives of the Global Biodiversity Information Facility (GBIF), Integrated Digitized Biocollections (iDigBio), American Institute of Biological Sciences (AIBS), Biodiversity Collections Network (BCoN), Natural Science Collections Alliance (NSCA), Distributed System of Scientific Collections (DiSSCo), Atlas of Living Australia (ALA), Biodiversity Information Standards (TDWG), Society for the Preservation of Natural History Collections (SPNHC), National Specimen Information Infrastructure of China (NSII), and South African National Biodiversity Institute (SANBI), as well as individuals involved with biodiversity informatics initiatives, natural science collections, museums, herbaria, and universities. Our global partners group strives to increase representation from around the globe as we aim to enable research that contributes to novel discoveries and addresses the societal challenges leading to the biodiversity crisis. Our overarching mission is to expand on the community-driven successes to connect biodiversity data and knowledge through coordination of a globally integrated network of stakeholders to enable an extensible technical and social infrastructure of data, tools, and working practices in support of our vision. The main work of our group thus far includes publishing a paper on the Digital Extended Specimen (Hardisty et al. 2022), organizing and hosting an array of activities at conferences, and asynchronous online work and forum-based exchanges. We aim to advance discussion on topics of broad interest to our community such as social and technical capacity building, broadening participation, expanding social and data networks, improving data models and building a backbone for the DES, and ide","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136299015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Implementing CARE Principles to Link Noongar Language and Knowledge to Western Science through the Atlas of Living Australia","authors":"N. Raisbeck‐Brown, Denise Smith-Ali","doi":"10.3897/biss.7.112349","DOIUrl":"https://doi.org/10.3897/biss.7.112349","url":null,"abstract":"The Atlas of Living Australia (ALA), Australia's national online biodiversity database, is partnering with the Noongar Boodjar Language Centre (NBALC) to promote Indigenous language and knowledge by including Noongar names for plants and animals in the ALA. Names are included in the ALA species page for each plant and animal and knowledge is built into the Noongar Plant and Animal online Encyclopedia, hosted in the ALA. We demonstrate the use of CARE principles (Collective Benefit, Authority to Control, Responsibility, and Ethics (Carroll et al. 2020)) to engage, support, and deliver the project and outcomes to the Noongar people and communities working with us. \u0000 The ALA addresses the FAIR principles (Wilkinson et al. 2016) for data management and stewardship ensuring data are findable, accessable, interoperable, and reusable. The ALA is partnering with NBALC in Perth to ensure all sharing of Noongar data is on Noongar terms. NBALC and ALA have been working with Noongar-Wadjari, a southern clan from the Fitzgerald River area in Western Australia, to collect, protect and share their language and traditional knowledge for local species.*1\u0000 The Noongar Encyclopedia project exhibits Collective Benefit because it is a co-innovation project that was co-designed by NBALC and ALA. The project’s activities were designed by the Community-endorsed representatives, the Knowledge Holders. The aims and aspirations of the Community were included in the project design to ensure equitable outcomes. NBALC’s more than 25-year relationship with the Community, and as Noongar people themselves, meant they had a good understanding of what the Community might want from the project. These assumptions were tested and refined during the first Community consultation, before the project plan was finalised. The Community are keen for their traditional knowledge to be shared and freely available to their Community. The ALA only shared knowledge that has passed through strict consent processes. It is seen as a safe and stable digital environment for now and the future, and where the traditional knowledge can be accessed freely and easily. The link to western science knowledge is secondary to knowledge sharing for most of the Aboriginal and Torres Strait Islander Communities that the ALA are working with although the benefits of scientists having access to both knowledge systems is seen as a positive step in care for Country into the future.\u0000 The Noongar Encyclopedia project ensures Noongar Authority to Control these data because NBALC, as an Aboriginal organisation, led by Noongar people, understands the rights and interests of the Communities we are working with. Protection of these rights and inclusion of Community interests are written into the project methodology as part of the project co-design. It is important to ensure the project is working with the right people within the Community. NBALC facilitates this by finding people who hold traditional knowledge, and can trace","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"12 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85507465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bidirectional Linking: Benefits, challenges, pitfalls, and solutions","authors":"Guido Sautter, D. Agosti","doi":"10.3897/biss.7.112344","DOIUrl":"https://doi.org/10.3897/biss.7.112344","url":null,"abstract":"Taxonomy, and biodiversity science in general, mainly revolve around four types of entities, which are available digitally in ever increasing numbers from different services: (1) Physical specimens (kept in museums and other collections around the world) and observations are available digitally via the Global Biodiversity Information Facility (GBIF). (2) DNA sequences (often derived from preserved specimens) are available from the European Nucleotide Archive (ENA) and National Center for Biotechnology Information (NCBI), having accession numbers as their primary means of citation. (3) Taxa, identified by taxon names, are increasingly registered to nomenclatural reference databases (ZooBank, International Plant Names Index (IPNI)) and aggregated in the Catalogue of Life (CoL). (4) Taxonomic treatments combine the former three; they define taxa, express scientific opinions about existing taxa, based upon specimens as well as DNA sequences derived from themand coin respective names; they are available from TreatmentBank (as well as Zenodo/Biodiversity Literature Repository (BLR) and Swiss Institute of Bioinformatics Literature Services (SIBiLS), and GBIF).\u0000 Traditionally, treatments cite specimens, taxa, and other treatments in mainly human-centric ways, describing where to find the cited object, but they are not immediately actionable in a digital sense. Specimen citations use institution and collection codes and catalog numbers (often combined with geographical and environmental data). Taxon names are a type of self-citing entities, especially when given in combination with their (bibliographic) authorship, as they represent a historical approach to human-readable taxon identifiers. Citations of treatments are very similar to those of taxon names, adding (bibliographic) information of subsequent name usages as needed. Accession numbers for DNA sequences are the closest to modern digital identifiers. However, none of these means of citation, as usually found in literature, are readily machine actionable, which makes them hard to process at scale and analyze programmatically. Identifiers coined by the various data providers, in combination with APIs to resolve them, alleviate this problem and enable computational navigation of such links. However, this alone only defers the problem, as actionable identifiers (e.g., HTTP URIs) at some point still need to be inferred from the information given in the traditional means of citation where the latter occur in data.\u0000 Recent projects, like BiCIKL, aim to add machine navigable links to the various entities (or respective data records) at scale, in pursuit of (ideally) fully intermeshed records, connecting (1) treatments to subject taxon names and concepts, cited specimens and DNA sequences, as well as cited treatments (with explicit nomenclatorial implications, e.g., taxon name synonymies or rebuttals thereof), (2) (digital) specimens to assigned taxon names, citing treatments, and any derived DNA sequences,","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87815824","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Global Biodata Coalition: Towards a sustainable biodata infrastructure","authors":"Chuck Cook, Guy Cochrane","doi":"10.3897/biss.7.112303","DOIUrl":"https://doi.org/10.3897/biss.7.112303","url":null,"abstract":"Progress in life and biomedical sciences depends absolutely on biodata resources—databases comprising biological data and services around those databases. Supporting scientists in data operations and spanning management, analysis and publication of newly generated data and access to pre-existing reference data, these biodata resources together comprise a critical infrastructure for life science and biomedical research. Familiar scientific infrastructures—for example the Conseil Européen pour la Recherche Nucléaire (CERN) or the Square Kilometer Array, are distinct, constructed, physical entities that are centrally funded and managed at one or more identifiable locations. By contrast, the primary infrastructure of the life sciences—comprised of databases and other biological data resources—is globally distributed, virtually connected, funded from multiple sources, and is not managed as a coordinated entity. While this configuration supports innovation, it lends itself poorly to the long-term sustainability of individual biodata resources and of the infrastructure as a whole. The Global Biodata Coalition (GBC) brings together life science research funding organisations that recognise these challenges and acknowledge the threat that the lack of sustainability poses. They agree to work together to find ways to improve sustainability.\u0000 In the presentation, we will provide an overview of the global biodata resource infrastructure, focusing in particular on challenges to providing sustained long-term funding to the resources that comprise the infrastructure. This will provide a global context to other presentations in the session, which focus on biodata resources in Australia.\u0000 Covering some of the work that GBC has carried out to understand and classify biodata resources and the entire biodata resource infrastructure, we will outline the Global Core Biodata Resource programme and Inventory project and also introduce the stakeholder consultation processes around approaches to sustainability and open data. Finally, we will lay out the path GBC is taking to engage researchers, informaticians, funding organisations and other stakeholders in moving towards greater sustainability for these critical resources","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"197 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76232603","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Celebrating BHL Australia through the Eye of the (Tasmanian) Tiger","authors":"Nicole Kearney","doi":"10.3897/biss.7.112352","DOIUrl":"https://doi.org/10.3897/biss.7.112352","url":null,"abstract":"BHL Australia, the Australian branch of the Biodiversity Heritage Library (BHL), was launched in 2010 and began operation with a single organisation, Museums Victoria in Melbourne. Since then, it has grown considerably. Funded by the Atlas of Living Australia, BHL Australia now digitises biodiversity literature on behalf of 42 organisations across the country. These organisations include museums, herbaria, state libraries, royal societies, government agencies, field naturalist clubs and natural history publishers, many of whom lack the resources to do this work themselves. BHL Australia’s national consortium model, which makes biodiversity literature accessible on behalf of so many organisations, is unique amongst the BHL global community. Most BHL operations digitise material on behalf of a single organisation.\u0000 BHL Australia has now made over 530,000 pages of Australia’s biodiversity knowledge freely accessible online. The BHL Australia Collection includes both published works (books and journals) and unpublished material (collection registers, field diaries and correspondence). The pages of these works are filled with species descriptions, references to historically significant people and, most importantly, scientific data that is critical to ongoing research and conservation efforts. Providing access to materials published as far back as the 1600s and as recently as the current year, the collection chronicles the scientific discovery and understanding of Australia’s biodiversity.\u0000 BHL Australia also leads the global initiative to bring the world's historic biodiversity and taxonomic literature into the modern linked network of scholarly research by incorporating article data into BHL and retrospectively assigning DOIs (Digital Object Identifiers) (Kearney et al. 2021). BHL has now assigned more than 162,000 DOIs to historic publications, making them persistently citable and trackable, both within BHL and beyond. \u0000 This paper will celebrate the achievements of BHL Australia by journeying through the (now accessible, discoverable and DOI'd) Tasmanian Tiger literature. It will showcase:\u0000 \u0000 \u0000 \u0000 previously elusive descriptions (and beautiful illustrations) of Thylacines, including those by Gerhard Krefft (1871) https://doi.org/10.5962/p.314741, and John Gould (1863) https://doi.org/10.5962/p.312790;\u0000 \u0000 \u0000 the invaluable creation of links to open access versions from paywalled publications that should be in the public domain, such as the first description of the Thylacine (Harris 1808): open access on BHL; paywalled by Oxford Academic;\u0000 \u0000 \u0000 the many citations of historic taxonomic descriptions that are now appearing as clickable DOI links in modern scholarly articles, taxonomic databases, social media, and Wikipedia (Kearney and Page 2022); and\u0000 \u0000 \u0000 the efforts being made to encourage more authors to cite the authoritative source of taxonomic names (Benichou 2022).\u0000 \u0000 \u0000 \u0000 previously elusive descriptions (and beautiful illustrations) of Thylacines, i","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"13 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84286904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Novel Part in the Swiss Army Knife for Linking Biodiversity Data: The digital specimen identifier service","authors":"W. Addink, Soulaine Theocharides, Sharif Islam","doi":"10.3897/biss.7.112283","DOIUrl":"https://doi.org/10.3897/biss.7.112283","url":null,"abstract":"Digital specimens are new information objects on the internet, which act as digital surrogates of the physical objects they represent. They are designed to be extended with data derived from the specimen like genetic, morphological and chemical data, and with data that puts the specimen in context of its gathering event and the environment it was derived from. This requires linking the digital specimens and their related entities to information about agents, locations, publications, taxa and environmental information. To establish reliable links and (re-)connect data to specimens, a new framework is needed, which creates persistent identifiers (PIDs) for the digital specimen and its related entities. These PIDs should be actionable by machines but also can be used by humans for data citation and communication purposes.\u0000 The framework that enables this is a new PID infrastructure, produced by the European Commission-funded BiCIKL project (Biodiversity Community Integrated Knowledge Library), creates persistent and actionable identifiers. It is a generic PID infrastructure that will be used by the Distributed System for Scientific Collections research infrastructure (DiSSCo), but it can also be used by other infrastructures and institutions. PIDs minted by DiSSCo will be linked to the digital specimens and samples provided through DiSSCo. The new PIDs are a key element in enabling the concept of Digital Extended Specimens (Webster et al. 2021) and provide unique and resolvable references to enable bidirectional linking. \u0000 DiSSCo has done extensive work to select the most appropriate PID scheme (Hardisty et al. 2021) and to design a PID infrastructure for the pan-European specimens. The draft design has been discussed with technical specialists in the joint DiSSCo and Consortium of European Taxonomic Facilities (CETAF) community, with international stakeholders like the Global Biodiversity Information Facility (GBIF) and Integrated Digitized Biocollections (iDigBio) and was discussed at the 2022 conference of the Society for the Preservation of Natural History Collections (SPNHC). A first implementation was demonstrated in the Biodiversity Information Standards (TDWG) annual conference in 2022 and illustrated key elements in the design. To be able to provide digital specimen identifiers as DOIs (Digital Object Identifiers), a pilot project was started in 2023 with DataCite to investigate if Digital Specimen DOIs in the new PID infrastructure can be created using the DataCite service. The pilot aim was to create metadata crosswalks to the DataCite schema in consultation with the DataCite Metadata Working Group, to evaluate synergies with the IGSN (International Generic Sample Number) metadata schema, to develop and test PID kernel metadata registration, and to evaluate performance and the impact of using DataCite services. There are around two billion specimens and creating PIDs for them as DOIs requires creating DOIs at an unprecedented scale. Also,","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"39 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89183646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Y. Sica, Wesley Hochachka, Yi-Ming Gan, Kate Ingenloff, Dmitry Schigel, Robert Stevenson, Steven Baskauf, Peter Brenton, Anahita J. N. Kazem, John Wieczorek
{"title":"Want to Describe and Share Biodiversity Inventory and Monitoring Data? The Humboldt Extension for Ecological Inventories Can Help!","authors":"Y. Sica, Wesley Hochachka, Yi-Ming Gan, Kate Ingenloff, Dmitry Schigel, Robert Stevenson, Steven Baskauf, Peter Brenton, Anahita J. N. Kazem, John Wieczorek","doi":"10.3897/biss.7.112229","DOIUrl":"https://doi.org/10.3897/biss.7.112229","url":null,"abstract":"Access to high-quality ecological data is critical to assessing and modeling biodiversity and its changes through space and time. The Darwin Core standard has proven to be immensely helpful in sharing species occurrence data (see Wieczorek et al. 2012, Global Biodiversity Information Facility, GBIF) and promoting biodiversity research following the FAIR principles of findability, accessibility, interoperability and reusability (Wilkinson et al. 2016). However, it is limited in its ability to fully accommodate inventory data (i.e., linked records of multiple taxa at a specific place and time). Information about the inventory processes is often either unreported or described in an unstructured manner, limiting its potential re-use for larger-scale analyses. Two key aspects that are not captured in a structured manner yet are: i) information about the species that were not detected during an inventory, and ii) ancillary information about sampling effort and completeness.\u0000 Non-detections (i.e., reported counts of zero) potentially enable more accurate and precise estimates of distribution, abundance, and changes in abundance. This becomes possible when variation in effort is used to estimate the likelihood that a non-detection represents a true absence of that taxon during the inventory. Currently, ecological inventory data, when shared at all, are typically discoverable through dataset catalogs (e.g., governmental data repositories) and supplementary materials to publications. With few exceptions, indexing of such data with the detail and structure needed has not been attempted at broad temporal and spatial scales, despite the potentially high value resulting from making inventory data more readily accessible.\u0000 To address these limitations in documenting inventory data using the Darwin Core, Guralnick et al. (2018) proposed the Humboldt Core. Subsequent discussions within the biodiversity standards community made it clear that greater integration could be achieved by creating an extension of the Darwin Core, rather than developing a new standard in isolation. Extension design work began in 2021 and progress has been reported by Brenton (2021) and Sica et al. (2022). \u0000 Over the last year the Humboldt Extension Task Group has sought advice from data providers and aggregators and updated its vocabulary terms. A challenging aspect has been creating terminology for the parent-child relationships (see Properties of Hierarchical Events) needed to describe surveys that may be as simple as a collection of checklists (one level of hierarchy) or as complex as species records from traps within plots along transects across habitats over multiple years (at least four levels of hierarchy). The Task Group has committed to completing a User Guide for the Humboldt Extension. Group members who contributed to the Darwin Core (Darwin Core Task Group 2009) and the Vocabulary Maintenance Specification (Vocabulary Maintenance Specification Task Group 2017) have provided va","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"1939 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91122617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Biodata Infrastructure within Australia and Beyond: Landscapes and horizons","authors":"Jeff Christiansen, Kathryn Hall","doi":"10.3897/biss.7.112274","DOIUrl":"https://doi.org/10.3897/biss.7.112274","url":null,"abstract":"In current life science practice, digital data are associated with all parts of the research lifecycle. Generation and management of data are planned for during project conception; collected from numerous instruments or existing sources; prepared for analysis and analysed to generate new knowledge and information; and then (hopefully) preserved so that the data may be found, shared and re-used by others when appropriate. \u0000 This session will begin with a scan of the biodata and biodata infrastructure landscape within Australia. We will explore which organisations fund biodata generation, where data are processed and stored, and how data are made available for reuse by others. Important global and complementary data resources that are hosted offshore will also be discussed. To guarantee reproducibility and integrity for life sciences research, it is critical that each of these infrastructures (whether they are hosted on- or off-shore) are maintained for the long term.\u0000 As an example of a resource that utilises a mixture of existing on- and off-shore data infrastructures to underpin a critical research need, the Australian Reference Genome Atlas (ARGA) will be discussed. ARGA is solving the problem of genomics data obscurity for Australian-relevant species by creating an online platform where life sciences researchers can comprehensively and confidently search for genomic data for taxa relevant to Australian research. Publicly available genomics (and genetics) data are aggregated and indexed from multiple sources (both on- and off-shore), and then integrated with occurrence records and the taxonomic frameworks of the Global Biodiversity Information Facility (GBIF) and the Atlas of Living Australia (ALA) to enrich the genomic data and make them searchable using taxonomy, location, ecological characteristics and selected phenotypic data. The presentation sets the scene for a subsequent talk by members of the Global Biodata Coalition (GBC), who will outline the challenges in sustaining the types of disseminated infrastructure discussed and the GBC’s work with the funders who support many of these resources to ensure long-term funding for existing infrastructure, while also channelling support to underpin future growth in data volumes and new technologies.","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"32 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87150039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}