使用世界化身提取化学合成信息

IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY
Simon D. Rihm, Fabio Saluz, Aleksandar Kondinski, Jiaru Bai, Patrick W. V. Butler, Sebastian Mosbach, Jethro Akroyd and Markus Kraft
{"title":"使用世界化身提取化学合成信息","authors":"Simon D. Rihm, Fabio Saluz, Aleksandar Kondinski, Jiaru Bai, Patrick W. V. Butler, Sebastian Mosbach, Jethro Akroyd and Markus Kraft","doi":"10.1039/D5DD00183H","DOIUrl":null,"url":null,"abstract":"<p >This work presents a generalisable process that transforms unstructured synthesis descriptions of metal–organic polyhedra (MOPs) – a class of organometallic nanocages – into machine-readable, structured representations, integrating them into The World Avatar (TWA), a universal knowledge representation encompassing physical, abstract, and conceptual entities. TWA makes use of knowledge graphs and semantic agents. While previous work established rational design principles for MOPs in the context of TWA, experimental verification remains a bottleneck due to the lack of accessible and structured synthesis data. However, synthesis information in the literature is often sparse, ambiguous, and embedded with implicit knowledge, making direct translation into structured formats a significant challenge. To achieve this, a synthesis ontology was developed to standardise the representation of chemical synthesis procedures by building on existing standardisation efforts. We then designed an LLM-based pipeline with advanced prompt engineering strategies to automate data extraction and created workflows for seamless integration into a knowledge representation within TWA. Using this approach, we extracted and uploaded nearly 300 synthesis procedures, automatically linking reactants, chemical building units, and MOPs to related entities across interconnected knowledge graphs. Over 90% of publications were processed successfully through the fully automated pipeline without manual intervention. The demonstrated use cases show that this framework supports chemists in designing and executing experiments and enables data-driven retrosynthetic analysis, laying the groundwork for autonomous, knowledge-guided discovery in reticular chemistry.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 10","pages":" 2893-2909"},"PeriodicalIF":6.2000,"publicationDate":"2025-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d5dd00183h?page=search","citationCount":"0","resultStr":"{\"title\":\"Extraction of chemical synthesis information using the World Avatar\",\"authors\":\"Simon D. Rihm, Fabio Saluz, Aleksandar Kondinski, Jiaru Bai, Patrick W. V. Butler, Sebastian Mosbach, Jethro Akroyd and Markus Kraft\",\"doi\":\"10.1039/D5DD00183H\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p >This work presents a generalisable process that transforms unstructured synthesis descriptions of metal–organic polyhedra (MOPs) – a class of organometallic nanocages – into machine-readable, structured representations, integrating them into The World Avatar (TWA), a universal knowledge representation encompassing physical, abstract, and conceptual entities. TWA makes use of knowledge graphs and semantic agents. While previous work established rational design principles for MOPs in the context of TWA, experimental verification remains a bottleneck due to the lack of accessible and structured synthesis data. However, synthesis information in the literature is often sparse, ambiguous, and embedded with implicit knowledge, making direct translation into structured formats a significant challenge. To achieve this, a synthesis ontology was developed to standardise the representation of chemical synthesis procedures by building on existing standardisation efforts. We then designed an LLM-based pipeline with advanced prompt engineering strategies to automate data extraction and created workflows for seamless integration into a knowledge representation within TWA. Using this approach, we extracted and uploaded nearly 300 synthesis procedures, automatically linking reactants, chemical building units, and MOPs to related entities across interconnected knowledge graphs. Over 90% of publications were processed successfully through the fully automated pipeline without manual intervention. The demonstrated use cases show that this framework supports chemists in designing and executing experiments and enables data-driven retrosynthetic analysis, laying the groundwork for autonomous, knowledge-guided discovery in reticular chemistry.</p>\",\"PeriodicalId\":72816,\"journal\":{\"name\":\"Digital discovery\",\"volume\":\" 10\",\"pages\":\" 2893-2909\"},\"PeriodicalIF\":6.2000,\"publicationDate\":\"2025-08-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d5dd00183h?page=search\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Digital discovery\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://pubs.rsc.org/en/content/articlelanding/2025/dd/d5dd00183h\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"CHEMISTRY, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Digital discovery","FirstCategoryId":"1085","ListUrlMain":"https://pubs.rsc.org/en/content/articlelanding/2025/dd/d5dd00183h","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

摘要

这项工作提出了一个可推广的过程,将金属有机多面体(MOPs)(一类有机金属纳米容器)的非结构化合成描述转换为机器可读的结构化表示,并将其集成到世界化身(TWA)中,这是一种包含物理、抽象和概念实体的通用知识表示。TWA利用知识图和语义代理。虽然之前的工作建立了TWA背景下MOPs的合理设计原则,但由于缺乏可访问和结构化的合成数据,实验验证仍然是一个瓶颈。然而,文献中的综合信息通常是稀疏的、模糊的,并且嵌入了隐性知识,这使得直接翻译成结构化格式成为一个重大挑战。为了实现这一目标,一个合成本体被开发出来,通过建立在现有的标准化工作上来标准化化学合成过程的表示。然后,我们设计了一个基于llm的管道,采用先进的快速工程策略来自动提取数据,并创建了无缝集成到TWA中的知识表示的工作流程。使用这种方法,我们提取并上传了近300个合成过程,自动将反应物、化学构建单元和MOPs连接到相互关联的知识图谱中的相关实体。超过90%的出版物通过完全自动化的流水线成功处理,无需人工干预。演示用例表明,该框架支持化学家设计和执行实验,并支持数据驱动的反合成分析,为自主的、知识引导的网状化学发现奠定基础。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

Extraction of chemical synthesis information using the World Avatar

Extraction of chemical synthesis information using the World Avatar

This work presents a generalisable process that transforms unstructured synthesis descriptions of metal–organic polyhedra (MOPs) – a class of organometallic nanocages – into machine-readable, structured representations, integrating them into The World Avatar (TWA), a universal knowledge representation encompassing physical, abstract, and conceptual entities. TWA makes use of knowledge graphs and semantic agents. While previous work established rational design principles for MOPs in the context of TWA, experimental verification remains a bottleneck due to the lack of accessible and structured synthesis data. However, synthesis information in the literature is often sparse, ambiguous, and embedded with implicit knowledge, making direct translation into structured formats a significant challenge. To achieve this, a synthesis ontology was developed to standardise the representation of chemical synthesis procedures by building on existing standardisation efforts. We then designed an LLM-based pipeline with advanced prompt engineering strategies to automate data extraction and created workflows for seamless integration into a knowledge representation within TWA. Using this approach, we extracted and uploaded nearly 300 synthesis procedures, automatically linking reactants, chemical building units, and MOPs to related entities across interconnected knowledge graphs. Over 90% of publications were processed successfully through the fully automated pipeline without manual intervention. The demonstrated use cases show that this framework supports chemists in designing and executing experiments and enables data-driven retrosynthetic analysis, laying the groundwork for autonomous, knowledge-guided discovery in reticular chemistry.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
2.80
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信