An automatic end-to-end chemical synthesis development platform powered by large language models

IF 14.7 1区综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES

Nature Communications Pub Date : 2024-11-23 DOI:10.1038/s41467-024-54457-x

Yixiang Ruan, Chenyin Lu, Ning Xu, Yuchen He, Yixin Chen, Jian Zhang, Jun Xuan, Jianzhang Pan, Qun Fang, Hanyu Gao, Xiaodong Shen, Ning Ye, Qiang Zhang, Yiming Mo

{"title":"An automatic end-to-end chemical synthesis development platform powered by large language models","authors":"Yixiang Ruan, Chenyin Lu, Ning Xu, Yuchen He, Yixin Chen, Jian Zhang, Jun Xuan, Jianzhang Pan, Qun Fang, Hanyu Gao, Xiaodong Shen, Ning Ye, Qiang Zhang, Yiming Mo","doi":"10.1038/s41467-024-54457-x","DOIUrl":null,"url":null,"abstract":"<p>The rapid emergence of large language model (LLM) technology presents promising opportunities to facilitate the development of synthetic reactions. In this work, we leveraged the power of GPT-4 to build an LLM-based reaction development framework (LLM-RDF) to handle fundamental tasks involved throughout the chemical synthesis development. LLM-RDF comprises six specialized LLM-based agents, including Literature Scouter, Experiment Designer, Hardware Executor, Spectrum Analyzer, Separation Instructor, and Result Interpreter, which are pre-prompted to accomplish the designated tasks. A web application with LLM-RDF as the backend was built to allow chemist users to interact with automated experimental platforms and analyze results via natural language, thus, eliminating the need for coding skills and ensuring accessibility for all chemists. We demonstrated the capabilities of LLM-RDF in guiding the end-to-end synthesis development process for the copper/TEMPO catalyzed aerobic alcohol oxidation to aldehyde reaction, including literature search and information extraction, substrate scope and condition screening, reaction kinetics study, reaction condition optimization, reaction scale-up and product purification. Furthermore, LLM-RDF’s broader applicability and versability was validated on various synthesis tasks of three distinct reactions (S<sub>N</sub>Ar reaction, photoredox C-C cross-coupling reaction, and heterogeneous photoelectrochemical reaction).</p>","PeriodicalId":19066,"journal":{"name":"Nature Communications","volume":"8 1","pages":""},"PeriodicalIF":14.7000,"publicationDate":"2024-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nature Communications","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1038/s41467-024-54457-x","RegionNum":1,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}

引用次数: 0

Abstract

The rapid emergence of large language model (LLM) technology presents promising opportunities to facilitate the development of synthetic reactions. In this work, we leveraged the power of GPT-4 to build an LLM-based reaction development framework (LLM-RDF) to handle fundamental tasks involved throughout the chemical synthesis development. LLM-RDF comprises six specialized LLM-based agents, including Literature Scouter, Experiment Designer, Hardware Executor, Spectrum Analyzer, Separation Instructor, and Result Interpreter, which are pre-prompted to accomplish the designated tasks. A web application with LLM-RDF as the backend was built to allow chemist users to interact with automated experimental platforms and analyze results via natural language, thus, eliminating the need for coding skills and ensuring accessibility for all chemists. We demonstrated the capabilities of LLM-RDF in guiding the end-to-end synthesis development process for the copper/TEMPO catalyzed aerobic alcohol oxidation to aldehyde reaction, including literature search and information extraction, substrate scope and condition screening, reaction kinetics study, reaction condition optimization, reaction scale-up and product purification. Furthermore, LLM-RDF’s broader applicability and versability was validated on various synthesis tasks of three distinct reactions (S_NAr reaction, photoredox C-C cross-coupling reaction, and heterogeneous photoelectrochemical reaction).

Abstract Image

查看原文本刊更多论文

由大型语言模型驱动的端到端化学合成自动开发平台

大型语言模型（LLM）技术的迅速兴起为促进合成反应的开发提供了大有可为的机会。在这项工作中，我们利用 GPT-4 的强大功能，构建了一个基于 LLM 的反应开发框架（LLM-RDF），以处理整个化学合成开发过程中涉及的基本任务。LLM-RDF 由六个基于 LLM 的专门代理组成，包括文献搜寻器、实验设计者、硬件执行器、光谱分析器、分离指导器和结果解释器，它们会预先被提示完成指定任务。我们建立了一个以 LLM-RDF 为后台的网络应用程序，让化学家用户能够通过自然语言与自动化实验平台互动并分析结果，从而无需编码技能，确保所有化学家都能使用。我们展示了 LLM-RDF 在指导铜/TEMPO 催化有氧醇氧化成醛反应的端到端合成开发过程中的能力，包括文献检索和信息提取、底物范围和条件筛选、反应动力学研究、反应条件优化、反应放大和产物纯化。此外，LLM-RDF 在三个不同反应（SNAr 反应、光氧化 C-C 交叉偶联反应和异相光电化学反应）的各种合成任务中验证了其更广泛的适用性和通用性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Nature Communications Biological Science Disciplines-

CiteScore

24.90

自引率

2.40%

发文量

6928

审稿时长

3.7 months

期刊介绍： Nature Communications, an open-access journal, publishes high-quality research spanning all areas of the natural sciences. Papers featured in the journal showcase significant advances relevant to specialists in each respective field. With a 2-year impact factor of 16.6 (2022) and a median time of 8 days from submission to the first editorial decision, Nature Communications is committed to rapid dissemination of research findings. As a multidisciplinary journal, it welcomes contributions from biological, health, physical, chemical, Earth, social, mathematical, applied, and engineering sciences, aiming to highlight important breakthroughs within each domain.