生物多样性数字孪生的数据标准和互操作性挑战:生物多样性研究和应用的一种新颖的变革性方法

Biodiversity Information Science and Standards Pub Date : 2023-09-11 DOI:10.3897/biss.7.112373

Sharif Islam, Hanna Koivula, Dag Endresen, Erik Kusch, Dmitry Schigel, Wouter Addink

{"title":"生物多样性数字孪生的数据标准和互操作性挑战:生物多样性研究和应用的一种新颖的变革性方法","authors":"Sharif Islam, Hanna Koivula, Dag Endresen, Erik Kusch, Dmitry Schigel, Wouter Addink","doi":"10.3897/biss.7.112373","DOIUrl":null,"url":null,"abstract":"The Biodiversity Digital Twin (BioDT) project (2022-2025) aims to create prototypes that integrate various data sets, models, and expert domain knowledge enabling prediction capabilities and decision-making support for critical issues in biodiversity dynamics. While digital twin concepts have been applied in industries for continuous monitoring of physical phenomena, their application in biodiversity and environmental sciences presents novel challenges (Bauer et al. 2021, de Koning et al. 2023). In addition, successfully developing digital twins for biodiversity requires addressing interoperability challenges in data standards. BioDT is developing prototype digital twins based on use cases that span various data complexities, from point occurrence data to bioacoustics, covering nationwide forest states to specific communities and individual species. The project relies on FAIR principles (Findable, Accessible, Interoperable, and Reusable) and FAIR enabling resources like standards and vocabularies (Schultes et al. 2020) to enable the exchange, sharing, and reuse of biodiversity information, fostering collaboration among participating research infrastructures (DiSSCo, eLTER, GBIF, and LifeWatch) and data providers. It also involves creating a harmonised abstraction layer using Persistent Identifiers (PID) and FAIR Digital Object (FDO) records, alongside semantic mapping and crosswalk techniques to provide machine-actionable metadata (Schultes and Wittenburg 2019, Schwardmann 2020). Governance and engagement with research infrastructure stakeholders play crucial roles in this regard, with a focus on aligning technical and data standards discussions. In addition to data, models and workflows are key elements in BioDT. Models in the BioDT context are formal representations of problems or processes, implemented through equations, algorithms, or a combination of both, which can be executed by machine entities. The current twin prototypes are considering both statistical and mechanistic models, introducing significant variations in (1) data requirements, (2) modelling approaches and philosophy, and (3) model output. The BioDT consortium will develop guidelines and protocols for how to describe these models, what metadata to include, and how they will interact with the diverse datasets. While discussions on this topic exist within the broader context of biodiversity and ecological sciences (Jeltsch et al. 2013, Fer et al. 2020), the BioDT project is strongly committed to finding a solution within its scope. In the twinning context, data and models need to be executed within a computing infrastructure and also need to adhere to FAIR principles. Software within BioDT includes a suite of tools that facilitate data acquisition, storage, processing, and analysis. While some of these tools already exist, the challenge lies in integrating them within the digital twinning framework. One approach to achieving integration is through workflow representation, encompassing standardised procedures and protocols that guide the acquisition, packaging, processing, and analysis of data. The project is exploring Research Object Crate (RO-Crate) implementation for this (Soiland-Reyes et al. 2022). Implementing workflows can ensure reproducibility, scalability, and transparency in research practices, enabling scientists to validate and replicate findings. The BioDT project offers a novel and transformative approach to biodiversity research and application. By leveraging collaborative research infrastructures and adhering to data standards, BioDT aims to harness the power of data, software, supercomputers, models, and expertise to provide new insights. The foundation provided by the data standards, including those of Biodiversity Information Standards (TDWG), is crucial in realising the full potential of digital twins, facilitating the seamless integration of diverse data sources and combinations with models.","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"48 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Data Standards and Interoperability Challenges for Biodiversity Digital Twin: A novel and transformative approach to biodiversity research and application\",\"authors\":\"Sharif Islam, Hanna Koivula, Dag Endresen, Erik Kusch, Dmitry Schigel, Wouter Addink\",\"doi\":\"10.3897/biss.7.112373\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The Biodiversity Digital Twin (BioDT) project (2022-2025) aims to create prototypes that integrate various data sets, models, and expert domain knowledge enabling prediction capabilities and decision-making support for critical issues in biodiversity dynamics. While digital twin concepts have been applied in industries for continuous monitoring of physical phenomena, their application in biodiversity and environmental sciences presents novel challenges (Bauer et al. 2021, de Koning et al. 2023). In addition, successfully developing digital twins for biodiversity requires addressing interoperability challenges in data standards. BioDT is developing prototype digital twins based on use cases that span various data complexities, from point occurrence data to bioacoustics, covering nationwide forest states to specific communities and individual species. The project relies on FAIR principles (Findable, Accessible, Interoperable, and Reusable) and FAIR enabling resources like standards and vocabularies (Schultes et al. 2020) to enable the exchange, sharing, and reuse of biodiversity information, fostering collaboration among participating research infrastructures (DiSSCo, eLTER, GBIF, and LifeWatch) and data providers. It also involves creating a harmonised abstraction layer using Persistent Identifiers (PID) and FAIR Digital Object (FDO) records, alongside semantic mapping and crosswalk techniques to provide machine-actionable metadata (Schultes and Wittenburg 2019, Schwardmann 2020). Governance and engagement with research infrastructure stakeholders play crucial roles in this regard, with a focus on aligning technical and data standards discussions. In addition to data, models and workflows are key elements in BioDT. Models in the BioDT context are formal representations of problems or processes, implemented through equations, algorithms, or a combination of both, which can be executed by machine entities. The current twin prototypes are considering both statistical and mechanistic models, introducing significant variations in (1) data requirements, (2) modelling approaches and philosophy, and (3) model output. The BioDT consortium will develop guidelines and protocols for how to describe these models, what metadata to include, and how they will interact with the diverse datasets. While discussions on this topic exist within the broader context of biodiversity and ecological sciences (Jeltsch et al. 2013, Fer et al. 2020), the BioDT project is strongly committed to finding a solution within its scope. In the twinning context, data and models need to be executed within a computing infrastructure and also need to adhere to FAIR principles. Software within BioDT includes a suite of tools that facilitate data acquisition, storage, processing, and analysis. While some of these tools already exist, the challenge lies in integrating them within the digital twinning framework. One approach to achieving integration is through workflow representation, encompassing standardised procedures and protocols that guide the acquisition, packaging, processing, and analysis of data. The project is exploring Research Object Crate (RO-Crate) implementation for this (Soiland-Reyes et al. 2022). Implementing workflows can ensure reproducibility, scalability, and transparency in research practices, enabling scientists to validate and replicate findings. The BioDT project offers a novel and transformative approach to biodiversity research and application. By leveraging collaborative research infrastructures and adhering to data standards, BioDT aims to harness the power of data, software, supercomputers, models, and expertise to provide new insights. The foundation provided by the data standards, including those of Biodiversity Information Standards (TDWG), is crucial in realising the full potential of digital twins, facilitating the seamless integration of diverse data sources and combinations with models.\",\"PeriodicalId\":9011,\"journal\":{\"name\":\"Biodiversity Information Science and Standards\",\"volume\":\"48 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-09-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Biodiversity Information Science and Standards\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3897/biss.7.112373\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biodiversity Information Science and Standards","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3897/biss.7.112373","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

生物多样性数字孪生(BioDT)项目(2022-2025)旨在创建集成各种数据集、模型和专家领域知识的原型，为生物多样性动态的关键问题提供预测能力和决策支持。虽然数字孪生概念已应用于工业中对物理现象进行连续监测，但它们在生物多样性和环境科学中的应用提出了新的挑战(Bauer等人，2021年，de Koning等人，2023年)。此外，成功开发生物多样性的数字孪生需要解决数据标准中的互操作性挑战。BioDT正在开发基于使用案例的数字双胞胎原型，这些案例涵盖了各种数据复杂性，从点发生数据到生物声学，涵盖了全国森林州到特定群落和单个物种。该项目依靠FAIR原则(可查找、可访问、可互操作和可重用)和FAIR支持资源，如标准和词汇表(Schultes et al. 2020)，实现生物多样性信息的交换、共享和重用，促进参与研究基础设施(disco、eLTER、GBIF和LifeWatch)和数据提供者之间的合作。它还涉及使用持久标识符(PID)和FAIR数字对象(FDO)记录创建一个协调的抽象层，以及语义映射和人行横道技术来提供机器可操作的元数据(Schultes和Wittenburg 2019, Schwardmann 2020)。治理和与研究基础设施利益相关者的接触在这方面发挥着关键作用，重点是协调技术和数据标准的讨论。除了数据之外，模型和工作流程也是生物odt的关键要素。BioDT上下文中的模型是问题或过程的形式化表示，通过方程、算法或两者的组合实现，可以由机器实体执行。目前的孪生原型同时考虑了统计模型和机械模型，在(1)数据需求，(2)建模方法和哲学，以及(3)模型输出方面引入了显著的变化。BioDT联盟将制定指导方针和协议，说明如何描述这些模型，包括哪些元数据，以及它们如何与各种数据集交互。虽然关于这一主题的讨论存在于生物多样性和生态科学的更广泛背景下(Jeltsch et al. 2013, Fer et al. 2020)，但BioDT项目坚定地致力于在其范围内寻找解决方案。在孪生上下文中，数据和模型需要在计算基础设施中执行，并且还需要遵守FAIR原则。BioDT内部的软件包括一套工具，用于促进数据采集、存储、处理和分析。虽然其中一些工具已经存在，但挑战在于将它们集成到数字孪生框架中。实现集成的一种方法是通过工作流表示，包括指导数据获取、打包、处理和分析的标准化过程和协议。该项目正在探索研究对象箱(RO-Crate)的实现(Soiland-Reyes et al. 2022)。实现工作流可以确保研究实践中的再现性、可扩展性和透明度，使科学家能够验证和复制发现。BioDT项目为生物多样性研究和应用提供了一种新颖的、变革性的方法。通过利用协作研究基础设施和坚持数据标准，BioDT旨在利用数据、软件、超级计算机、模型和专业知识的力量，提供新的见解。包括生物多样性信息标准(TDWG)在内的数据标准所提供的基础，对于充分发挥数字孪生的潜力、促进各种数据源的无缝集成以及与模型的组合至关重要。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Data Standards and Interoperability Challenges for Biodiversity Digital Twin: A novel and transformative approach to biodiversity research and application

The Biodiversity Digital Twin (BioDT) project (2022-2025) aims to create prototypes that integrate various data sets, models, and expert domain knowledge enabling prediction capabilities and decision-making support for critical issues in biodiversity dynamics. While digital twin concepts have been applied in industries for continuous monitoring of physical phenomena, their application in biodiversity and environmental sciences presents novel challenges (Bauer et al. 2021, de Koning et al. 2023). In addition, successfully developing digital twins for biodiversity requires addressing interoperability challenges in data standards. BioDT is developing prototype digital twins based on use cases that span various data complexities, from point occurrence data to bioacoustics, covering nationwide forest states to specific communities and individual species. The project relies on FAIR principles (Findable, Accessible, Interoperable, and Reusable) and FAIR enabling resources like standards and vocabularies (Schultes et al. 2020) to enable the exchange, sharing, and reuse of biodiversity information, fostering collaboration among participating research infrastructures (DiSSCo, eLTER, GBIF, and LifeWatch) and data providers. It also involves creating a harmonised abstraction layer using Persistent Identifiers (PID) and FAIR Digital Object (FDO) records, alongside semantic mapping and crosswalk techniques to provide machine-actionable metadata (Schultes and Wittenburg 2019, Schwardmann 2020). Governance and engagement with research infrastructure stakeholders play crucial roles in this regard, with a focus on aligning technical and data standards discussions. In addition to data, models and workflows are key elements in BioDT. Models in the BioDT context are formal representations of problems or processes, implemented through equations, algorithms, or a combination of both, which can be executed by machine entities. The current twin prototypes are considering both statistical and mechanistic models, introducing significant variations in (1) data requirements, (2) modelling approaches and philosophy, and (3) model output. The BioDT consortium will develop guidelines and protocols for how to describe these models, what metadata to include, and how they will interact with the diverse datasets. While discussions on this topic exist within the broader context of biodiversity and ecological sciences (Jeltsch et al. 2013, Fer et al. 2020), the BioDT project is strongly committed to finding a solution within its scope. In the twinning context, data and models need to be executed within a computing infrastructure and also need to adhere to FAIR principles. Software within BioDT includes a suite of tools that facilitate data acquisition, storage, processing, and analysis. While some of these tools already exist, the challenge lies in integrating them within the digital twinning framework. One approach to achieving integration is through workflow representation, encompassing standardised procedures and protocols that guide the acquisition, packaging, processing, and analysis of data. The project is exploring Research Object Crate (RO-Crate) implementation for this (Soiland-Reyes et al. 2022). Implementing workflows can ensure reproducibility, scalability, and transparency in research practices, enabling scientists to validate and replicate findings. The BioDT project offers a novel and transformative approach to biodiversity research and application. By leveraging collaborative research infrastructures and adhering to data standards, BioDT aims to harness the power of data, software, supercomputers, models, and expertise to provide new insights. The foundation provided by the data standards, including those of Biodiversity Information Standards (TDWG), is crucial in realising the full potential of digital twins, facilitating the seamless integration of diverse data sources and combinations with models.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Biodiversity Information Science and Standards

自引率

0.00%

发文量