A Simple Recipe for Cooking your AI-assisted Dish to Serve it in the International Digital Specimen Architecture

Biodiversity Information Science and Standards Pub Date : 2023-09-14 DOI:10.3897/biss.7.112678

Wouter Addink, Sam Leeflang, Sharif Islam

{"title":"A Simple Recipe for Cooking your AI-assisted Dish to Serve it in the International Digital Specimen Architecture","authors":"Wouter Addink, Sam Leeflang, Sharif Islam","doi":"10.3897/biss.7.112678","DOIUrl":null,"url":null,"abstract":"With the rise of Artificial Intelligence (AI), a large set of new tools and services is emerging that supports specimen data mapping, standards alignment, quality enhancement and enrichment of the data. These tools currently operate in isolation, targeted to individual collections, collection management systems and institutional datasets. To address this challenge, DiSSCo, the Distributed System of Scientific Collections, is developing a new infrastructure for digital specimens, transforming them into actionable information objects. This infrastructure incorporates a framework for annotation and curation that allows the objects to be enriched or enhanced by both experts and machines. This creates the unique possibility to plug-in AI-assisted services that can then leverage digital specimens through this infrastructure, which serves as a harmonised Findable, Accessible, Interoperable and Reusable (FAIR) abstraction layer on top of individual institutional systems or datasets. An early example of such services are the ones developed in the Specimen Data Refinery workflow (Hardisty et al. 2022). The new architecture, DS Arch or Digital Specimen Architecture, is built on the concept of FAIR Digital Objects (FDO) (Islam et al. 2020). All digital specimens and related objects are served with persistent identifiers and machine-readable FDO records with information for machines about the object together with a pointer to its machine-readable type description. The type describes the structure of the object, its attributes and describes allowed operations. The digital specimen type and specimen media type are based on existing Biodiversity Information Standards (TDWG) such as Darwin Core, Access to Biological Collection Data (ABCD) Schema and Audiovisual Core Multimedia Resources Metadata Schema, and include support for annotation operations based on the World Wide Web Consortium (W3C) Annotations Data Model. This enables AI-assisted services registered with DS Arch to autonomously discover digital specimen objects and determine the actions they are authorised to perform. AI-assisted services can facilitate various tasks such as digitisation, extract new information from specimen images, create relations with other objects or standardise data. These operations can be done autonomously, upon user request, or in tandem with expert validation. AI-assisted services registered with DS Arch, can interact in the same way with all digital specimens worldwide when served through DS Arch with their uniform FDO representation, even if the content richness, level of standardisation and scope of the specimen is different. DS Arch has been designed to serve digital specimens for living and preserved specimens, and preserved environmental, earth system and astrogeology samples. With the AI-assisted services, data can be annotated with new data, alternative values, corrections, and with new entity relationships. As a result, the digital specimens become Digital Extended Specimens enabling new science and application (Webster et al. 2021). With the implementation of a sophisticated trust model in DS Arch for community acceptance, these annotations will become part of the data itself and can be made available for inclusion in source systems such as collection management systems and aggregators such as Global Biodiversity Information Facility (GBIF), Geoscience Collections Access Service (GeoCASe) and Catalogue of Life. We aim to demonstrate in the session how AI-assisted services can be registered and used to annotate specimen data. Although the DiSSCo DS Arch is still in development and planned to become operational in 2025, we already have a sandbox environment available in which the concept can be tested and AI-assisted services can be piloted to act on digital specimen data. For testing purposes, the operations on specimens are currently limited to individual specimens and open data, however batch operations will also be possible in the future production environment.","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"66 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biodiversity Information Science and Standards","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3897/biss.7.112678","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

With the rise of Artificial Intelligence (AI), a large set of new tools and services is emerging that supports specimen data mapping, standards alignment, quality enhancement and enrichment of the data. These tools currently operate in isolation, targeted to individual collections, collection management systems and institutional datasets. To address this challenge, DiSSCo, the Distributed System of Scientific Collections, is developing a new infrastructure for digital specimens, transforming them into actionable information objects. This infrastructure incorporates a framework for annotation and curation that allows the objects to be enriched or enhanced by both experts and machines. This creates the unique possibility to plug-in AI-assisted services that can then leverage digital specimens through this infrastructure, which serves as a harmonised Findable, Accessible, Interoperable and Reusable (FAIR) abstraction layer on top of individual institutional systems or datasets. An early example of such services are the ones developed in the Specimen Data Refinery workflow (Hardisty et al. 2022). The new architecture, DS Arch or Digital Specimen Architecture, is built on the concept of FAIR Digital Objects (FDO) (Islam et al. 2020). All digital specimens and related objects are served with persistent identifiers and machine-readable FDO records with information for machines about the object together with a pointer to its machine-readable type description. The type describes the structure of the object, its attributes and describes allowed operations. The digital specimen type and specimen media type are based on existing Biodiversity Information Standards (TDWG) such as Darwin Core, Access to Biological Collection Data (ABCD) Schema and Audiovisual Core Multimedia Resources Metadata Schema, and include support for annotation operations based on the World Wide Web Consortium (W3C) Annotations Data Model. This enables AI-assisted services registered with DS Arch to autonomously discover digital specimen objects and determine the actions they are authorised to perform. AI-assisted services can facilitate various tasks such as digitisation, extract new information from specimen images, create relations with other objects or standardise data. These operations can be done autonomously, upon user request, or in tandem with expert validation. AI-assisted services registered with DS Arch, can interact in the same way with all digital specimens worldwide when served through DS Arch with their uniform FDO representation, even if the content richness, level of standardisation and scope of the specimen is different. DS Arch has been designed to serve digital specimens for living and preserved specimens, and preserved environmental, earth system and astrogeology samples. With the AI-assisted services, data can be annotated with new data, alternative values, corrections, and with new entity relationships. As a result, the digital specimens become Digital Extended Specimens enabling new science and application (Webster et al. 2021). With the implementation of a sophisticated trust model in DS Arch for community acceptance, these annotations will become part of the data itself and can be made available for inclusion in source systems such as collection management systems and aggregators such as Global Biodiversity Information Facility (GBIF), Geoscience Collections Access Service (GeoCASe) and Catalogue of Life. We aim to demonstrate in the session how AI-assisted services can be registered and used to annotate specimen data. Although the DiSSCo DS Arch is still in development and planned to become operational in 2025, we already have a sandbox environment available in which the concept can be tested and AI-assisted services can be piloted to act on digital specimen data. For testing purposes, the operations on specimens are currently limited to individual specimens and open data, however batch operations will also be possible in the future production environment.

查看原文本刊更多论文

一个简单的食谱烹饪你的人工智能辅助菜，以服务它在国际数字标本架构

随着人工智能(AI)的兴起，大量新工具和服务正在出现，以支持标本数据映射，标准对齐，质量增强和数据丰富。这些工具目前是孤立运行的，针对个人收集、收集管理系统和机构数据集。为了应对这一挑战，DiSSCo，即分布式科学收藏系统，正在为数字标本开发一种新的基础设施，将它们转化为可操作的信息对象。这个基础结构包含了一个用于注释和管理的框架，该框架允许专家和机器对对象进行丰富或增强。这为插入人工智能辅助服务创造了独特的可能性，然后可以通过该基础设施利用数字样本，该基础设施作为单个机构系统或数据集之上的协调的可查找、可访问、可互操作和可重用(FAIR)抽象层。此类服务的早期示例是在样本数据提炼工作流中开发的服务(Hardisty et al. 2022)。新的建筑，DS拱门或数字标本建筑，是建立在FAIR数字对象(FDO)的概念上的(Islam等人，2020)。所有数字标本和相关对象都提供持久标识符和机器可读的FDO记录，其中包含有关该对象的机器信息以及指向其机器可读类型描述的指针。类型描述了对象的结构、属性和允许的操作。数字标本类型和标本介质类型基于现有的生物多样性信息标准(TDWG)，如达尔文核心(Darwin Core)、生物收集数据访问(ABCD)模式和视听核心多媒体资源元数据模式，并支持基于万维网联盟(W3C)注释数据模型的注释操作。这使得在DS Arch注册的人工智能辅助服务能够自主发现数字标本对象，并确定它们被授权执行的操作。人工智能辅助服务可以促进各种任务，如数字化，从标本图像中提取新信息，创建与其他对象的关系或标准化数据。这些操作可以根据用户请求自主完成，也可以与专家验证一起完成。在DS Arch注册的人工智能辅助服务，即使标本的内容丰富程度、标准化水平和范围不同，也可以通过DS Arch以统一的FDO表示与全球所有数字标本以相同的方式进行交互。DS Arch的设计目的是为活体和保存标本以及保存的环境、地球系统和天体地质标本提供数字标本。使用人工智能辅助服务，可以用新数据、替代值、更正和新的实体关系对数据进行注释。因此，数字标本成为数字扩展标本，从而实现新的科学和应用(Webster et al. 2021)。随着DS Arch在社区接受方面的复杂信任模型的实施，这些注释将成为数据本身的一部分，并可用于包含在源系统中，如收集管理系统和聚合器，如全球生物多样性信息设施(GBIF)、地球科学收集访问服务(GeoCASe)和生命目录。我们的目标是在会议上展示人工智能辅助服务如何注册和用于注释标本数据。虽然DiSSCo DS Arch仍在开发中，计划于2025年投入使用，但我们已经有了一个沙盒环境，可以对概念进行测试，并可以对人工智能辅助服务进行试点，以对数字样本数据进行操作。出于测试目的，目前对样品的操作仅限于单个样品和开放数据，但是在未来的生产环境中也可以进行批量操作。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Biodiversity Information Science and Standards

自引率

0.00%

发文量