自动驾驶实验室中自动工作流和知识图谱生成的自然语言处理

IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY

Digital discovery Pub Date : 2025-05-05 DOI:10.1039/D5DD00063G

Bastian Ruehle

{"title":"自动驾驶实验室中自动工作流和知识图谱生成的自然语言处理","authors":"Bastian Ruehle","doi":"10.1039/D5DD00063G","DOIUrl":null,"url":null,"abstract":"Natural language processing with the help of large language models such as ChatGPT has become ubiquitous in many software applications and allows users to interact even with complex hardware or software in an intuitive way. The recent concepts of Self-Driving Labs and Material Acceleration Platforms stand to benefit greatly from making them more accessible to a broader scientific community through enhanced user-friendliness or even completely automated ways of generating experimental workflows that can be run on the complex hardware of the platform from user input or previously published procedures. Here, two new datasets with over 1.5 million experimental procedures and their (semi)automatic annotations as action graphs, i.e., structured output, were created and used for training two different transformer-based large language models. These models strike a balance between performance, generality, and fitness for purpose and can be hosted and run on standard consumer-grade hardware. Furthermore, the generation of node graphs from these action graphs as a user-friendly and intuitive way of visualizing and modifying synthesis workflows that can be run on the hardware of a Self-Driving Lab or Material Acceleration Platform is explored. Lastly, it is discussed how knowledge graphs – following an ontology imposed by the underlying node setup and software architecture – can be generated from the node graphs. All resources, including the datasets, the fully trained large language models, the node editor, and scripts for querying and visualizing the knowledge graphs are made publicly available.","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 6","pages":" 1534-1543"},"PeriodicalIF":6.2000,"publicationDate":"2025-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d5dd00063g?page=search","citationCount":"0","resultStr":"{\"title\":\"Natural language processing for automated workflow and knowledge graph generation in self-driving labs†\",\"authors\":\"Bastian Ruehle\",\"doi\":\"10.1039/D5DD00063G\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Natural language processing with the help of large language models such as ChatGPT has become ubiquitous in many software applications and allows users to interact even with complex hardware or software in an intuitive way. The recent concepts of Self-Driving Labs and Material Acceleration Platforms stand to benefit greatly from making them more accessible to a broader scientific community through enhanced user-friendliness or even completely automated ways of generating experimental workflows that can be run on the complex hardware of the platform from user input or previously published procedures. Here, two new datasets with over 1.5 million experimental procedures and their (semi)automatic annotations as action graphs, i.e., structured output, were created and used for training two different transformer-based large language models. These models strike a balance between performance, generality, and fitness for purpose and can be hosted and run on standard consumer-grade hardware. Furthermore, the generation of node graphs from these action graphs as a user-friendly and intuitive way of visualizing and modifying synthesis workflows that can be run on the hardware of a Self-Driving Lab or Material Acceleration Platform is explored. Lastly, it is discussed how knowledge graphs – following an ontology imposed by the underlying node setup and software architecture – can be generated from the node graphs. All resources, including the datasets, the fully trained large language models, the node editor, and scripts for querying and visualizing the knowledge graphs are made publicly available.\",\"PeriodicalId\":72816,\"journal\":{\"name\":\"Digital discovery\",\"volume\":\" 6\",\"pages\":\" 1534-1543\"},\"PeriodicalIF\":6.2000,\"publicationDate\":\"2025-05-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d5dd00063g?page=search\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Digital discovery\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://pubs.rsc.org/en/content/articlelanding/2025/dd/d5dd00063g\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"CHEMISTRY, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Digital discovery","FirstCategoryId":"1085","ListUrlMain":"https://pubs.rsc.org/en/content/articlelanding/2025/dd/d5dd00063g","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

摘要

在ChatGPT等大型语言模型的帮助下，自然语言处理在许多软件应用程序中已经无处不在，它允许用户以直观的方式与复杂的硬件或软件进行交互。最近的自动驾驶实验室和材料加速平台的概念将大大受益，通过增强用户友好性，甚至完全自动化的方式生成实验工作流程，使它们更容易被更广泛的科学界使用，这些实验工作流程可以从用户输入或先前发布的程序中运行在平台的复杂硬件上。在这里，创建了两个新的数据集，其中包含超过150万个实验过程及其（半）自动注释作为动作图，即结构化输出，并用于训练两个不同的基于转换器的大型语言模型。这些模型在性能、通用性和适用性之间取得了平衡，并且可以在标准的消费级硬件上托管和运行。此外，从这些动作图中生成节点图，作为一种用户友好和直观的可视化和修改合成工作流的方式，可以在自动驾驶实验室或材料加速平台的硬件上运行。最后，讨论了如何从节点图生成知识图——遵循底层节点设置和软件架构强加的本体。所有资源，包括数据集、经过充分训练的大型语言模型、节点编辑器以及用于查询和可视化知识图的脚本，都是公开可用的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Natural language processing for automated workflow and knowledge graph generation in self-driving labs†

查看原文本刊更多论文

Natural language processing for automated workflow and knowledge graph generation in self-driving labs†

Natural language processing with the help of large language models such as ChatGPT has become ubiquitous in many software applications and allows users to interact even with complex hardware or software in an intuitive way. The recent concepts of Self-Driving Labs and Material Acceleration Platforms stand to benefit greatly from making them more accessible to a broader scientific community through enhanced user-friendliness or even completely automated ways of generating experimental workflows that can be run on the complex hardware of the platform from user input or previously published procedures. Here, two new datasets with over 1.5 million experimental procedures and their (semi)automatic annotations as action graphs, i.e., structured output, were created and used for training two different transformer-based large language models. These models strike a balance between performance, generality, and fitness for purpose and can be hosted and run on standard consumer-grade hardware. Furthermore, the generation of node graphs from these action graphs as a user-friendly and intuitive way of visualizing and modifying synthesis workflows that can be run on the hardware of a Self-Driving Lab or Material Acceleration Platform is explored. Lastly, it is discussed how knowledge graphs – following an ontology imposed by the underlying node setup and software architecture – can be generated from the node graphs. All resources, including the datasets, the fully trained large language models, the node editor, and scripts for querying and visualizing the knowledge graphs are made publicly available.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Digital discovery

CiteScore

2.80

自引率

0.00%

发文量