I Know Something You Don’t Know: The annotation saga continues…

Biodiversity Information Science and Standards Pub Date : 2023-09-14 DOI:10.3897/biss.7.112715

James Macklin, David Shorthouse, Falko Glöckler

{"title":"I Know Something You Don’t Know: The annotation saga continues…","authors":"James Macklin, David Shorthouse, Falko Glöckler","doi":"10.3897/biss.7.112715","DOIUrl":null,"url":null,"abstract":"Over the past 20 years, the biodiversity informatics community has pursued components of the digital annotation landscape with varying degrees of success. We will provide an historical overview of the theory, the advancements made through a few key projects, and will identify some of the ongoing challenges and opportunities. The fundamental principles remain unchanged since annotations were first proposed. Someone (or something): (1) has an enhancement to make elsewhere from the source where original data or information are generated or transcribed; (2) wishes to broadcast these statements to the originator and to others who may benefit; and (3) expects persistence, discoverability, and attribution for their contributions alongside the source. The Filtered Push project (Morris et al. 2013) considered several use cases and pioneered development of services based on the technology of the day. The exchange of data between parties in a universally consistent way necessitated the development of a novel draft standard for data annotations via an extension of the World Wide Web Consortium’s Web Annotation Working Group standard (Sanderson et al. 2013) to be sufficiently informative for a data curator to confidently make a decision. Figure 2 from Morris et al. (2013), reproduced here as Fig. 1, outlines the composition of an annotation data package for a taxonomic identification. The package contains the data object(s) associated with an occurrence, an expression of the motivation(s) for updating, some evidence for an assertion, and a stated expectation for how the receiving entity should take action. The Filtered Push and Annosys (Tschöpe et al. 2013) projects also considered implementation strategies involving collection management systems (e.g., Symbiota) and portals (e.g., European Distributed Institute of Taxonomy, EDIT). However, there remain technological barriers for these systems to operate at scale, the least of which is the absence of globally unique, persistent, resolvable identifiers for shared objects and concepts. Major aggregation infrastructures like the Global Biodiversity Information Facility (GBIF) and the Distributed System of Scientific Collections (DiSSCo) rely on data enhancement to improve the quality of their resources and have annotation services in their work plans. More recently, the Digital Extended Specimen (DES) concept (Hardisty et al. 2022) will rely on annotation services as key components of the proposed infrastructure. Recent work on annotation services more generally has considered various new forms of packaging and delivery such as Frictionless Data (Fowler et al. 2018), Journal Article Tag Suite XML (Agosti et al. 2022), or nanopublications (Kuhn et al. 2018). There is risk in fragmentation of this landscape and disenfranchisement of both biological collections and the wider research community if we fail to align the purpose, content, and structure of these packages or if these fail to remain aligned with FAIR principles. Institutional collection management systems currently represent the canonical data store that provides data to researchers and data aggregators. It is critical that information and/or feedback about the data they release be round-tripped back to them for consideration. However, the sheer volume of annotations that could be generated by both human and machine curation processes will overwhelm local data curators and the systems supporting them. One solution to this is to create a central annotation store with write and discovery services that best support the needs of all stewards of data. This will require an international consortium of parties with a governance and technical model to assure its sustainability.","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"214 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biodiversity Information Science and Standards","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3897/biss.7.112715","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Over the past 20 years, the biodiversity informatics community has pursued components of the digital annotation landscape with varying degrees of success. We will provide an historical overview of the theory, the advancements made through a few key projects, and will identify some of the ongoing challenges and opportunities. The fundamental principles remain unchanged since annotations were first proposed. Someone (or something): (1) has an enhancement to make elsewhere from the source where original data or information are generated or transcribed; (2) wishes to broadcast these statements to the originator and to others who may benefit; and (3) expects persistence, discoverability, and attribution for their contributions alongside the source. The Filtered Push project (Morris et al. 2013) considered several use cases and pioneered development of services based on the technology of the day. The exchange of data between parties in a universally consistent way necessitated the development of a novel draft standard for data annotations via an extension of the World Wide Web Consortium’s Web Annotation Working Group standard (Sanderson et al. 2013) to be sufficiently informative for a data curator to confidently make a decision. Figure 2 from Morris et al. (2013), reproduced here as Fig. 1, outlines the composition of an annotation data package for a taxonomic identification. The package contains the data object(s) associated with an occurrence, an expression of the motivation(s) for updating, some evidence for an assertion, and a stated expectation for how the receiving entity should take action. The Filtered Push and Annosys (Tschöpe et al. 2013) projects also considered implementation strategies involving collection management systems (e.g., Symbiota) and portals (e.g., European Distributed Institute of Taxonomy, EDIT). However, there remain technological barriers for these systems to operate at scale, the least of which is the absence of globally unique, persistent, resolvable identifiers for shared objects and concepts. Major aggregation infrastructures like the Global Biodiversity Information Facility (GBIF) and the Distributed System of Scientific Collections (DiSSCo) rely on data enhancement to improve the quality of their resources and have annotation services in their work plans. More recently, the Digital Extended Specimen (DES) concept (Hardisty et al. 2022) will rely on annotation services as key components of the proposed infrastructure. Recent work on annotation services more generally has considered various new forms of packaging and delivery such as Frictionless Data (Fowler et al. 2018), Journal Article Tag Suite XML (Agosti et al. 2022), or nanopublications (Kuhn et al. 2018). There is risk in fragmentation of this landscape and disenfranchisement of both biological collections and the wider research community if we fail to align the purpose, content, and structure of these packages or if these fail to remain aligned with FAIR principles. Institutional collection management systems currently represent the canonical data store that provides data to researchers and data aggregators. It is critical that information and/or feedback about the data they release be round-tripped back to them for consideration. However, the sheer volume of annotations that could be generated by both human and machine curation processes will overwhelm local data curators and the systems supporting them. One solution to this is to create a central annotation store with write and discovery services that best support the needs of all stewards of data. This will require an international consortium of parties with a governance and technical model to assure its sustainability.

查看原文本刊更多论文

我知道一些你不知道的事情:注释的传奇还在继续……

在过去的20年里，生物多样性信息界一直在追求数字注释景观的组成部分，并取得了不同程度的成功。我们将提供理论的历史概述，通过几个关键项目取得的进展，并将确定一些持续的挑战和机遇。自从首次提出注释以来，基本原则一直没有改变。某人(或某物):(1)在产生或转录原始数据或信息的来源的其他地方进行增强;(二)希望向发起人和其他可能受益的人广播该声明的;并且(3)期望持久性，可发现性，以及他们的贡献与来源的归属。过滤推送项目(Morris et al. 2013)考虑了几个用例，并开创了基于当时技术的服务开发。各方之间以普遍一致的方式交换数据，需要通过扩展万维网联盟的网络注释工作组标准(Sanderson et al. 2013)来开发一种新的数据注释标准草案，以提供足够的信息，使数据管理员能够自信地做出决策。Morris等人(2013)的图2(此处复制为图1)概述了用于分类学鉴定的注释数据包的组成。包包含与事件关联的数据对象、更新动机的表达式、断言的一些证据以及对接收实体应如何采取行动的声明期望。Filtered Push和Annosys (Tschöpe et al. 2013)项目也考虑了涉及收集管理系统(例如Symbiota)和门户(例如European Distributed Institute of Taxonomy, EDIT)的实施策略。然而，这些系统的大规模运行仍然存在技术障碍，其中最重要的是缺乏全局唯一的、持久的、可解析的共享对象和概念标识符。全球生物多样性信息设施(GBIF)和分布式科学馆藏系统(DiSSCo)等主要聚合基础设施依靠数据增强来提高其资源质量，并在其工作计划中提供注释服务。最近，数字扩展样本(DES)概念(Hardisty et al. 2022)将依赖注释服务作为拟议基础设施的关键组件。最近关于注释服务的工作更普遍地考虑了各种新的包装和交付形式，如Frictionless Data (Fowler等人，2018)、Journal Article Tag Suite XML (Agosti等人，2022)或纳米出版物(Kuhn等人，2018)。如果我们不能使这些包的目的、内容和结构保持一致，或者如果这些包不能与FAIR原则保持一致，那么就有可能使这一景观支离破碎，剥夺生物收藏和更广泛的研究界的权利。机构收集管理系统目前代表了向研究人员和数据聚合者提供数据的规范数据存储。至关重要的是，关于他们发布的数据的信息和/或反馈要返回给他们考虑。然而，人类和机器管理过程产生的大量注释将使本地数据管理人员和支持他们的系统不堪重负。对此的一种解决方案是创建一个带有写入和发现服务的中央注释存储，以最好地支持所有数据管理员的需求。这将需要一个由各方组成的国际联盟，拥有一个治理和技术模式，以确保其可持续性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Biodiversity Information Science and Standards

自引率

0.00%

发文量