{"title":"大型语言模型,加速有机化学合成","authors":"Yu Zhang, Yang Han, Shuai Chen, Ruijie Yu, Xin Zhao, Xianbin Liu, Kaipeng Zeng, Mengdi Yu, Jidong Tian, Feng Zhu, Xiaokang Yang, Yaohui Jin, Yanyan Xu","doi":"10.1038/s42256-025-01066-y","DOIUrl":null,"url":null,"abstract":"<p>Chemical synthesis, as a foundational methodology in the creation of transformative molecules, exerts substantial influence across diverse sectors from life sciences to materials and energy. Current chemical synthesis practices emphasize laborious and costly trial-and-error workflows, underscoring the urgent needs for advanced AI assistants. Recently, large language models, typified by GPT-4, have been introduced as an efficient tool to facilitate scientific research. Here we present Chemma, a fully fine-tuned large language model with 1.28 million pairs of questions and answers about reactions, as an assistant to accelerate organic chemistry synthesis. Chemma surpasses the best-known results in multiple chemical tasks, for example, single-step retrosynthesis and yield prediction, which highlights the potential of general artificial intelligence for organic chemistry. By predicting yields across the experimental reaction space, Chemma significantly improves the reaction exploration capability of Bayesian optimization. More importantly, integrated in an active learning framework, Chemma exhibits advanced potentials of autonomously experimental exploration and optimization in open reaction spaces. For an unreported Suzuki–Miyaura cross-coupling reaction of cyclic aminoboronates and aryl halides for the synthesis of α-aryl N-heterocycles, the human–artificial intelligence collaboration successfully explored a suitable ligand (tri(1-adamantyl)phosphine) and solvent (1,4-dioxane) within only 15 runs, achieving an isolated yield of 67%. These results reveal that, without quantum-chemical calculations, Chemma can comprehend and extract chemical insights from reaction data, in a manner akin to human experts. This work opens avenues for accelerating organic chemistry synthesis with adapted large language models.</p>","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"41 1","pages":""},"PeriodicalIF":23.9000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Large language models to accelerate organic chemistry synthesis\",\"authors\":\"Yu Zhang, Yang Han, Shuai Chen, Ruijie Yu, Xin Zhao, Xianbin Liu, Kaipeng Zeng, Mengdi Yu, Jidong Tian, Feng Zhu, Xiaokang Yang, Yaohui Jin, Yanyan Xu\",\"doi\":\"10.1038/s42256-025-01066-y\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Chemical synthesis, as a foundational methodology in the creation of transformative molecules, exerts substantial influence across diverse sectors from life sciences to materials and energy. Current chemical synthesis practices emphasize laborious and costly trial-and-error workflows, underscoring the urgent needs for advanced AI assistants. Recently, large language models, typified by GPT-4, have been introduced as an efficient tool to facilitate scientific research. Here we present Chemma, a fully fine-tuned large language model with 1.28 million pairs of questions and answers about reactions, as an assistant to accelerate organic chemistry synthesis. Chemma surpasses the best-known results in multiple chemical tasks, for example, single-step retrosynthesis and yield prediction, which highlights the potential of general artificial intelligence for organic chemistry. By predicting yields across the experimental reaction space, Chemma significantly improves the reaction exploration capability of Bayesian optimization. More importantly, integrated in an active learning framework, Chemma exhibits advanced potentials of autonomously experimental exploration and optimization in open reaction spaces. For an unreported Suzuki–Miyaura cross-coupling reaction of cyclic aminoboronates and aryl halides for the synthesis of α-aryl N-heterocycles, the human–artificial intelligence collaboration successfully explored a suitable ligand (tri(1-adamantyl)phosphine) and solvent (1,4-dioxane) within only 15 runs, achieving an isolated yield of 67%. These results reveal that, without quantum-chemical calculations, Chemma can comprehend and extract chemical insights from reaction data, in a manner akin to human experts. This work opens avenues for accelerating organic chemistry synthesis with adapted large language models.</p>\",\"PeriodicalId\":48533,\"journal\":{\"name\":\"Nature Machine Intelligence\",\"volume\":\"41 1\",\"pages\":\"\"},\"PeriodicalIF\":23.9000,\"publicationDate\":\"2025-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Nature Machine Intelligence\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1038/s42256-025-01066-y\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nature Machine Intelligence","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1038/s42256-025-01066-y","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Large language models to accelerate organic chemistry synthesis
Chemical synthesis, as a foundational methodology in the creation of transformative molecules, exerts substantial influence across diverse sectors from life sciences to materials and energy. Current chemical synthesis practices emphasize laborious and costly trial-and-error workflows, underscoring the urgent needs for advanced AI assistants. Recently, large language models, typified by GPT-4, have been introduced as an efficient tool to facilitate scientific research. Here we present Chemma, a fully fine-tuned large language model with 1.28 million pairs of questions and answers about reactions, as an assistant to accelerate organic chemistry synthesis. Chemma surpasses the best-known results in multiple chemical tasks, for example, single-step retrosynthesis and yield prediction, which highlights the potential of general artificial intelligence for organic chemistry. By predicting yields across the experimental reaction space, Chemma significantly improves the reaction exploration capability of Bayesian optimization. More importantly, integrated in an active learning framework, Chemma exhibits advanced potentials of autonomously experimental exploration and optimization in open reaction spaces. For an unreported Suzuki–Miyaura cross-coupling reaction of cyclic aminoboronates and aryl halides for the synthesis of α-aryl N-heterocycles, the human–artificial intelligence collaboration successfully explored a suitable ligand (tri(1-adamantyl)phosphine) and solvent (1,4-dioxane) within only 15 runs, achieving an isolated yield of 67%. These results reveal that, without quantum-chemical calculations, Chemma can comprehend and extract chemical insights from reaction data, in a manner akin to human experts. This work opens avenues for accelerating organic chemistry synthesis with adapted large language models.
期刊介绍:
Nature Machine Intelligence is a distinguished publication that presents original research and reviews on various topics in machine learning, robotics, and AI. Our focus extends beyond these fields, exploring their profound impact on other scientific disciplines, as well as societal and industrial aspects. We recognize limitless possibilities wherein machine intelligence can augment human capabilities and knowledge in domains like scientific exploration, healthcare, medical diagnostics, and the creation of safe and sustainable cities, transportation, and agriculture. Simultaneously, we acknowledge the emergence of ethical, social, and legal concerns due to the rapid pace of advancements.
To foster interdisciplinary discussions on these far-reaching implications, Nature Machine Intelligence serves as a platform for dialogue facilitated through Comments, News Features, News & Views articles, and Correspondence. Our goal is to encourage a comprehensive examination of these subjects.
Similar to all Nature-branded journals, Nature Machine Intelligence operates under the guidance of a team of skilled editors. We adhere to a fair and rigorous peer-review process, ensuring high standards of copy-editing and production, swift publication, and editorial independence.