{"title":"Automated Retrosynthesis Planning of Macromolecules Using Large Language Models and Knowledge Graphs.","authors":"Qinyu Ma, Yuhao Zhou, Jianfeng Li","doi":"10.1002/marc.202500065","DOIUrl":null,"url":null,"abstract":"<p><p>Identifying reliable synthesis pathways in materials chemistry is a complex task, particularly in polymer science, due to the intricate and often nonunique nomenclature of macromolecules. To address this challenge, an agent system that integrates large language models (LLMs) and knowledge graphs is proposed. By leveraging LLMs' powerful capabilities for extracting and recognizing chemical substance names, and storing the extracted data in a structured knowledge graph, the system fully automates the retrieval of relevant literature, extraction of reaction data, database querying, construction of retrosynthetic pathway trees, further expansion through the retrieval of additional literature and recommendation of optimal reaction pathways. By considering the complex interdependencies among chemical reactants, a novel Multi-branched Reaction Pathway Search Algorithm (MBRPS) is proposed to help identify all valid multi-branched reaction pathways, which arise when a single product decomposes into multiple reaction intermediates. In contrast, previous studies are limited to cases where a product decomposes into at most one reaction intermediate. This work represents the first attempt to develop a fully automated retrosynthesis planning agent tailored specially for macromolecules powered by LLMs. Applied to polyimide synthesis, the new approach constructs a retrosynthetic pathway tree with hundreds of pathways and recommends optimized routes, including both known and novel pathways.</p>","PeriodicalId":205,"journal":{"name":"Macromolecular Rapid Communications","volume":" ","pages":"e2500065"},"PeriodicalIF":4.2000,"publicationDate":"2025-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Macromolecular Rapid Communications","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1002/marc.202500065","RegionNum":3,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"POLYMER SCIENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Identifying reliable synthesis pathways in materials chemistry is a complex task, particularly in polymer science, due to the intricate and often nonunique nomenclature of macromolecules. To address this challenge, an agent system that integrates large language models (LLMs) and knowledge graphs is proposed. By leveraging LLMs' powerful capabilities for extracting and recognizing chemical substance names, and storing the extracted data in a structured knowledge graph, the system fully automates the retrieval of relevant literature, extraction of reaction data, database querying, construction of retrosynthetic pathway trees, further expansion through the retrieval of additional literature and recommendation of optimal reaction pathways. By considering the complex interdependencies among chemical reactants, a novel Multi-branched Reaction Pathway Search Algorithm (MBRPS) is proposed to help identify all valid multi-branched reaction pathways, which arise when a single product decomposes into multiple reaction intermediates. In contrast, previous studies are limited to cases where a product decomposes into at most one reaction intermediate. This work represents the first attempt to develop a fully automated retrosynthesis planning agent tailored specially for macromolecules powered by LLMs. Applied to polyimide synthesis, the new approach constructs a retrosynthetic pathway tree with hundreds of pathways and recommends optimized routes, including both known and novel pathways.
期刊介绍:
Macromolecular Rapid Communications publishes original research in polymer science, ranging from chemistry and physics of polymers to polymers in materials science and life sciences.