基于潜伏生成过程的分子反合成Top-K预测

IF 7.3 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

CAAI Transactions on Intelligence Technology Pub Date : 2025-04-01 DOI:10.1049/cit2.70005

Yupeng Liu, Han Zhang, Rui Hu

{"title":"基于潜伏生成过程的分子反合成Top-K预测","authors":"Yupeng Liu, Han Zhang, Rui Hu","doi":"10.1049/cit2.70005","DOIUrl":null,"url":null,"abstract":"<p>In the field of organic synthesis, the core objective of retrosynthetic methods is to deduce possible synthetic routes and precursor molecules for complex target molecules. Traditional retrosynthetic methods, such as template-based retrosynthesis, have high accuracy and interpretability in specific types of reactions but are limited by the scope of the template library, making it difficult to adapt to new or uncommon reaction types. Moreover, sequence-to-sequence retrosynthetic prediction methods, although they enhance the flexibility of prediction, often overlook the complexity of molecular graph structures and the actual interactions between atoms, which limits the accuracy and reliability of the predictions. To address these limitations, this paper proposes a Molecular Retrosynthesis Top-k Prediction based on the Latent Generation Process (MRLGP) that uses latent variables from graph neural networks to model the generation process and produce diverse set of reactants. Utilising an encoding method based on Graphormer, the authors have also introduced topology-aware positional encoding to better capture the interactions between atomic nodes in the molecular graph structure, thereby more accurately simulating the retrosynthetic process. The MRLGP model significantly enhances the accuracy and diversity of predictions by correlating discrete latent variables with the reactant generation process and progressively constructing molecular graphs using a variational autoregressive decoder. Experimental results on benchmark datasets such as USPTO-50k, USPTO-Full, and USPTO-DIVERSE demonstrate that MRLGP outperforms baseline models on multiple Top-k evaluation metrics. Additionally, ablation experiments conducted on the USPTO-50K dataset further validate the effectiveness of the methods used in the encoder and decoder parts of the model.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"10 3","pages":"902-911"},"PeriodicalIF":7.3000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.70005","citationCount":"0","resultStr":"{\"title\":\"Molecular Retrosynthesis Top-K Prediction Based on the Latent Generation Process\",\"authors\":\"Yupeng Liu, Han Zhang, Rui Hu\",\"doi\":\"10.1049/cit2.70005\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>In the field of organic synthesis, the core objective of retrosynthetic methods is to deduce possible synthetic routes and precursor molecules for complex target molecules. Traditional retrosynthetic methods, such as template-based retrosynthesis, have high accuracy and interpretability in specific types of reactions but are limited by the scope of the template library, making it difficult to adapt to new or uncommon reaction types. Moreover, sequence-to-sequence retrosynthetic prediction methods, although they enhance the flexibility of prediction, often overlook the complexity of molecular graph structures and the actual interactions between atoms, which limits the accuracy and reliability of the predictions. To address these limitations, this paper proposes a Molecular Retrosynthesis Top-k Prediction based on the Latent Generation Process (MRLGP) that uses latent variables from graph neural networks to model the generation process and produce diverse set of reactants. Utilising an encoding method based on Graphormer, the authors have also introduced topology-aware positional encoding to better capture the interactions between atomic nodes in the molecular graph structure, thereby more accurately simulating the retrosynthetic process. The MRLGP model significantly enhances the accuracy and diversity of predictions by correlating discrete latent variables with the reactant generation process and progressively constructing molecular graphs using a variational autoregressive decoder. Experimental results on benchmark datasets such as USPTO-50k, USPTO-Full, and USPTO-DIVERSE demonstrate that MRLGP outperforms baseline models on multiple Top-k evaluation metrics. Additionally, ablation experiments conducted on the USPTO-50K dataset further validate the effectiveness of the methods used in the encoder and decoder parts of the model.</p>\",\"PeriodicalId\":46211,\"journal\":{\"name\":\"CAAI Transactions on Intelligence Technology\",\"volume\":\"10 3\",\"pages\":\"902-911\"},\"PeriodicalIF\":7.3000,\"publicationDate\":\"2025-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.70005\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"CAAI Transactions on Intelligence Technology\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/cit2.70005\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"CAAI Transactions on Intelligence Technology","FirstCategoryId":"94","ListUrlMain":"https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/cit2.70005","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

在有机合成领域，反合成方法的核心目标是推导复杂靶分子可能的合成路线和前体分子。传统的反合成方法，如基于模板的反合成，在特定类型的反应中具有较高的准确性和可解释性，但受到模板库范围的限制，难以适应新的或不常见的反应类型。此外，序列间逆合成预测方法虽然提高了预测的灵活性，但往往忽略了分子图结构的复杂性和原子间实际相互作用，从而限制了预测的准确性和可靠性。为了解决这些限制，本文提出了一种基于潜在生成过程（MRLGP）的分子反合成Top-k预测，该预测使用来自图神经网络的潜在变量来建模生成过程并产生不同的反应物集。利用基于graphhormer的编码方法，作者还引入了拓扑感知的位置编码，以更好地捕获分子图结构中原子节点之间的相互作用，从而更准确地模拟反合成过程。MRLGP模型通过将离散潜在变量与反应物生成过程相关联，并使用变分自回归解码器逐步构建分子图，显著提高了预测的准确性和多样性。在USPTO-50k、USPTO-Full和USPTO-DIVERSE等基准数据集上的实验结果表明，MRLGP在多个Top-k评估指标上优于基线模型。此外，在USPTO-50K数据集上进行的烧蚀实验进一步验证了模型编码器和解码器部分使用的方法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Molecular Retrosynthesis Top-K Prediction Based on the Latent Generation Process

查看原文本刊更多论文

Molecular Retrosynthesis Top-K Prediction Based on the Latent Generation Process

In the field of organic synthesis, the core objective of retrosynthetic methods is to deduce possible synthetic routes and precursor molecules for complex target molecules. Traditional retrosynthetic methods, such as template-based retrosynthesis, have high accuracy and interpretability in specific types of reactions but are limited by the scope of the template library, making it difficult to adapt to new or uncommon reaction types. Moreover, sequence-to-sequence retrosynthetic prediction methods, although they enhance the flexibility of prediction, often overlook the complexity of molecular graph structures and the actual interactions between atoms, which limits the accuracy and reliability of the predictions. To address these limitations, this paper proposes a Molecular Retrosynthesis Top-k Prediction based on the Latent Generation Process (MRLGP) that uses latent variables from graph neural networks to model the generation process and produce diverse set of reactants. Utilising an encoding method based on Graphormer, the authors have also introduced topology-aware positional encoding to better capture the interactions between atomic nodes in the molecular graph structure, thereby more accurately simulating the retrosynthetic process. The MRLGP model significantly enhances the accuracy and diversity of predictions by correlating discrete latent variables with the reactant generation process and progressively constructing molecular graphs using a variational autoregressive decoder. Experimental results on benchmark datasets such as USPTO-50k, USPTO-Full, and USPTO-DIVERSE demonstrate that MRLGP outperforms baseline models on multiple Top-k evaluation metrics. Additionally, ablation experiments conducted on the USPTO-50K dataset further validate the effectiveness of the methods used in the encoder and decoder parts of the model.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

CAAI Transactions on Intelligence Technology COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-

CiteScore

11.00

自引率

3.90%

发文量

134

审稿时长

35 weeks

期刊介绍： CAAI Transactions on Intelligence Technology is a leading venue for original research on the theoretical and experimental aspects of artificial intelligence technology. We are a fully open access journal co-published by the Institution of Engineering and Technology (IET) and the Chinese Association for Artificial Intelligence (CAAI) providing research which is openly accessible to read and share worldwide.