{"title":"基于潜伏生成过程的分子反合成Top-K预测","authors":"Yupeng Liu, Han Zhang, Rui Hu","doi":"10.1049/cit2.70005","DOIUrl":null,"url":null,"abstract":"<p>In the field of organic synthesis, the core objective of retrosynthetic methods is to deduce possible synthetic routes and precursor molecules for complex target molecules. Traditional retrosynthetic methods, such as template-based retrosynthesis, have high accuracy and interpretability in specific types of reactions but are limited by the scope of the template library, making it difficult to adapt to new or uncommon reaction types. Moreover, sequence-to-sequence retrosynthetic prediction methods, although they enhance the flexibility of prediction, often overlook the complexity of molecular graph structures and the actual interactions between atoms, which limits the accuracy and reliability of the predictions. To address these limitations, this paper proposes a Molecular Retrosynthesis Top-k Prediction based on the Latent Generation Process (MRLGP) that uses latent variables from graph neural networks to model the generation process and produce diverse set of reactants. Utilising an encoding method based on Graphormer, the authors have also introduced topology-aware positional encoding to better capture the interactions between atomic nodes in the molecular graph structure, thereby more accurately simulating the retrosynthetic process. The MRLGP model significantly enhances the accuracy and diversity of predictions by correlating discrete latent variables with the reactant generation process and progressively constructing molecular graphs using a variational autoregressive decoder. Experimental results on benchmark datasets such as USPTO-50k, USPTO-Full, and USPTO-DIVERSE demonstrate that MRLGP outperforms baseline models on multiple Top-k evaluation metrics. Additionally, ablation experiments conducted on the USPTO-50K dataset further validate the effectiveness of the methods used in the encoder and decoder parts of the model.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"10 3","pages":"902-911"},"PeriodicalIF":7.3000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.70005","citationCount":"0","resultStr":"{\"title\":\"Molecular Retrosynthesis Top-K Prediction Based on the Latent Generation Process\",\"authors\":\"Yupeng Liu, Han Zhang, Rui Hu\",\"doi\":\"10.1049/cit2.70005\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>In the field of organic synthesis, the core objective of retrosynthetic methods is to deduce possible synthetic routes and precursor molecules for complex target molecules. Traditional retrosynthetic methods, such as template-based retrosynthesis, have high accuracy and interpretability in specific types of reactions but are limited by the scope of the template library, making it difficult to adapt to new or uncommon reaction types. Moreover, sequence-to-sequence retrosynthetic prediction methods, although they enhance the flexibility of prediction, often overlook the complexity of molecular graph structures and the actual interactions between atoms, which limits the accuracy and reliability of the predictions. To address these limitations, this paper proposes a Molecular Retrosynthesis Top-k Prediction based on the Latent Generation Process (MRLGP) that uses latent variables from graph neural networks to model the generation process and produce diverse set of reactants. Utilising an encoding method based on Graphormer, the authors have also introduced topology-aware positional encoding to better capture the interactions between atomic nodes in the molecular graph structure, thereby more accurately simulating the retrosynthetic process. The MRLGP model significantly enhances the accuracy and diversity of predictions by correlating discrete latent variables with the reactant generation process and progressively constructing molecular graphs using a variational autoregressive decoder. Experimental results on benchmark datasets such as USPTO-50k, USPTO-Full, and USPTO-DIVERSE demonstrate that MRLGP outperforms baseline models on multiple Top-k evaluation metrics. Additionally, ablation experiments conducted on the USPTO-50K dataset further validate the effectiveness of the methods used in the encoder and decoder parts of the model.</p>\",\"PeriodicalId\":46211,\"journal\":{\"name\":\"CAAI Transactions on Intelligence Technology\",\"volume\":\"10 3\",\"pages\":\"902-911\"},\"PeriodicalIF\":7.3000,\"publicationDate\":\"2025-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.70005\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"CAAI Transactions on Intelligence Technology\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1049/cit2.70005\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"CAAI Transactions on Intelligence Technology","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1049/cit2.70005","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Molecular Retrosynthesis Top-K Prediction Based on the Latent Generation Process
In the field of organic synthesis, the core objective of retrosynthetic methods is to deduce possible synthetic routes and precursor molecules for complex target molecules. Traditional retrosynthetic methods, such as template-based retrosynthesis, have high accuracy and interpretability in specific types of reactions but are limited by the scope of the template library, making it difficult to adapt to new or uncommon reaction types. Moreover, sequence-to-sequence retrosynthetic prediction methods, although they enhance the flexibility of prediction, often overlook the complexity of molecular graph structures and the actual interactions between atoms, which limits the accuracy and reliability of the predictions. To address these limitations, this paper proposes a Molecular Retrosynthesis Top-k Prediction based on the Latent Generation Process (MRLGP) that uses latent variables from graph neural networks to model the generation process and produce diverse set of reactants. Utilising an encoding method based on Graphormer, the authors have also introduced topology-aware positional encoding to better capture the interactions between atomic nodes in the molecular graph structure, thereby more accurately simulating the retrosynthetic process. The MRLGP model significantly enhances the accuracy and diversity of predictions by correlating discrete latent variables with the reactant generation process and progressively constructing molecular graphs using a variational autoregressive decoder. Experimental results on benchmark datasets such as USPTO-50k, USPTO-Full, and USPTO-DIVERSE demonstrate that MRLGP outperforms baseline models on multiple Top-k evaluation metrics. Additionally, ablation experiments conducted on the USPTO-50K dataset further validate the effectiveness of the methods used in the encoder and decoder parts of the model.
期刊介绍:
CAAI Transactions on Intelligence Technology is a leading venue for original research on the theoretical and experimental aspects of artificial intelligence technology. We are a fully open access journal co-published by the Institution of Engineering and Technology (IET) and the Chinese Association for Artificial Intelligence (CAAI) providing research which is openly accessible to read and share worldwide.