A property graph schema for automated metadata capture, reproducibility and knowledge discovery in high-throughput bioprocess development†

IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY

Digital discovery Pub Date : 2025-06-12 DOI:10.1039/D5DD00070J

Federico M. Mione, Martin F. Luna, Lucas Kaspersetz, Peter Neubauer, Ernesto C. Martinez and M. Nicolas Cruz Bournazou

{"title":"A property graph schema for automated metadata capture, reproducibility and knowledge discovery in high-throughput bioprocess development†","authors":"Federico M. Mione, Martin F. Luna, Lucas Kaspersetz, Peter Neubauer, Ernesto C. Martinez and M. Nicolas Cruz Bournazou","doi":"10.1039/D5DD00070J","DOIUrl":null,"url":null,"abstract":"<p >Recent advances in autonomous experimentation and self-driving laboratories have drastically increased the complexity of orchestrating robotic experiments and of recording the different computational processes involved including all related metadata. Addressing this challenge requires a flexible and scalable information storage system that prioritizes the relationships between data and metadata, surpassing the limitations of traditional relational databases. To foster knowledge discovery in high-throughput bioprocess development, the computational control of the experimentation must be fully automated, with the capability to efficiently collect and manage experimental data and their integration into a knowledge base. This work proposes the adoption of graph databases integrated with a semantic structure to enable knowledge transfer between humans and machines. To this end, a property graph schema (PG-schema) has been specifically designed for high-throughput experiments in robotic platforms, focused mainly on the automation of the computational workflow used to ensure the reproducibility, reusability, and credibility of learned bioprocess models. A prototype implementation of the PG-schema and its integration with the workflow management system using simulated experiments is presented to highlight the advantages of the proposed approach in the generation of FAIR data.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 9","pages":" 2401-2422"},"PeriodicalIF":6.2000,"publicationDate":"2025-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d5dd00070j?page=search","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Digital discovery","FirstCategoryId":"1085","ListUrlMain":"https://pubs.rsc.org/en/content/articlelanding/2025/dd/d5dd00070j","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

Abstract

Recent advances in autonomous experimentation and self-driving laboratories have drastically increased the complexity of orchestrating robotic experiments and of recording the different computational processes involved including all related metadata. Addressing this challenge requires a flexible and scalable information storage system that prioritizes the relationships between data and metadata, surpassing the limitations of traditional relational databases. To foster knowledge discovery in high-throughput bioprocess development, the computational control of the experimentation must be fully automated, with the capability to efficiently collect and manage experimental data and their integration into a knowledge base. This work proposes the adoption of graph databases integrated with a semantic structure to enable knowledge transfer between humans and machines. To this end, a property graph schema (PG-schema) has been specifically designed for high-throughput experiments in robotic platforms, focused mainly on the automation of the computational workflow used to ensure the reproducibility, reusability, and credibility of learned bioprocess models. A prototype implementation of the PG-schema and its integration with the workflow management system using simulated experiments is presented to highlight the advantages of the proposed approach in the generation of FAIR data.

Abstract Image

查看原文本刊更多论文

高通量生物工艺开发中用于自动元数据捕获、再现性和知识发现的属性图模式

自主实验和自动驾驶实验室的最新进展大大增加了编排机器人实验和记录涉及的不同计算过程（包括所有相关元数据）的复杂性。解决这一挑战需要一个灵活且可扩展的信息存储系统，该系统可以优先考虑数据和元数据之间的关系，从而超越传统关系数据库的限制。为了促进高通量生物工艺开发中的知识发现，实验的计算控制必须完全自动化，具有有效收集和管理实验数据并将其集成到知识库中的能力。这项工作提出了采用与语义结构集成的图形数据库来实现人与机器之间的知识转移。为此，专门为机器人平台的高通量实验设计了一个属性图模式（PG-schema），主要关注用于确保学习生物过程模型的再现性、可重用性和可信度的计算工作流的自动化。通过模拟实验，给出了pg模式的原型实现及其与工作流管理系统的集成，以突出该方法在生成FAIR数据方面的优势。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Digital discovery

CiteScore

2.80

自引率

0.00%

发文量