Improving drug discovery with a hybrid deep generative model using reinforcement learning trained on a Bayesian docking approximation

IF 3 3区生物学 Q3 BIOCHEMISTRY & MOLECULAR BIOLOGY

Journal of Computer-Aided Molecular Design Pub Date : 2023-08-08 DOI:10.1007/s10822-023-00523-3

Youjin Xiong, Yiqing Wang, Yisheng Wang, Chenmei Li, Peng Yusong, Junyu Wu, Yiqing Wang, Lingyun Gu, Christopher J. Butch

{"title":"Improving drug discovery with a hybrid deep generative model using reinforcement learning trained on a Bayesian docking approximation","authors":"Youjin Xiong, Yiqing Wang, Yisheng Wang, Chenmei Li, Peng Yusong, Junyu Wu, Yiqing Wang, Lingyun Gu, Christopher J. Butch","doi":"10.1007/s10822-023-00523-3","DOIUrl":null,"url":null,"abstract":"<div><p>Generative approaches to molecular design are an area of intense study in recent years as a method to generate new pharmaceuticals with desired properties. Often though, these types of efforts are constrained by limited experimental activity data, resulting in either models that generate molecules with poor performance or models that are overfit and produce close analogs of known molecules. In this paper, we reduce this data dependency for the generation of new chemotypes by incorporating docking scores of known and de novo molecules to expand the applicability domain of the reward function and diversify the compounds generated during reinforcement learning. Our approach employs a deep generative model initially trained using a combination of limited known drug activity and an approximate docking score provided by a second machine learned Bayes regression model, with final evaluation of high scoring compounds by a full docking simulation. This strategy results in molecules with docking scores improved by 10–20% compared to molecules of similar size, while being 130 × faster than a docking only approach on a typical GPU workstation. We also show that the increased docking scores correlate with (1) docking poses with interactions similar to known inhibitors and (2) result in higher MM-GBSA binding energies comparable to the energies of known DDR1 inhibitors, demonstrating that the Bayesian model contains sufficient information for the network to learn to efficiently interact with the binding pocket during reinforcement learning. This outcome shows that the combination of the learned latent molecular representation along with the feature-based docking regression is sufficient for reinforcement learning to infer the relationship between the molecules and the receptor binding site, which suggest that our method can be a powerful tool for the discovery of new chemotypes with potential therapeutic applications.</p></div>","PeriodicalId":621,"journal":{"name":"Journal of Computer-Aided Molecular Design","volume":"37 11","pages":"507 - 517"},"PeriodicalIF":3.0000,"publicationDate":"2023-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10822-023-00523-3.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Computer-Aided Molecular Design","FirstCategoryId":"99","ListUrlMain":"https://link.springer.com/article/10.1007/s10822-023-00523-3","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Generative approaches to molecular design are an area of intense study in recent years as a method to generate new pharmaceuticals with desired properties. Often though, these types of efforts are constrained by limited experimental activity data, resulting in either models that generate molecules with poor performance or models that are overfit and produce close analogs of known molecules. In this paper, we reduce this data dependency for the generation of new chemotypes by incorporating docking scores of known and de novo molecules to expand the applicability domain of the reward function and diversify the compounds generated during reinforcement learning. Our approach employs a deep generative model initially trained using a combination of limited known drug activity and an approximate docking score provided by a second machine learned Bayes regression model, with final evaluation of high scoring compounds by a full docking simulation. This strategy results in molecules with docking scores improved by 10–20% compared to molecules of similar size, while being 130 × faster than a docking only approach on a typical GPU workstation. We also show that the increased docking scores correlate with (1) docking poses with interactions similar to known inhibitors and (2) result in higher MM-GBSA binding energies comparable to the energies of known DDR1 inhibitors, demonstrating that the Bayesian model contains sufficient information for the network to learn to efficiently interact with the binding pocket during reinforcement learning. This outcome shows that the combination of the learned latent molecular representation along with the feature-based docking regression is sufficient for reinforcement learning to infer the relationship between the molecules and the receptor binding site, which suggest that our method can be a powerful tool for the discovery of new chemotypes with potential therapeutic applications.

Abstract Image

查看原文本刊更多论文

利用基于贝叶斯对接近似训练的强化学习的混合深度生成模型改进药物发现

近年来，分子设计的生成方法作为一种产生具有期望性能的新药物的方法，受到了广泛的研究。通常，这些类型的努力受到有限的实验活动数据的限制，导致模型产生性能较差的分子或模型过拟合并产生已知分子的接近类似物。在本文中，我们通过结合已知分子和新生分子的对接分数来扩大奖励函数的适用范围，并使强化学习过程中产生的化合物多样化，从而减少了生成新化学型的数据依赖性。我们的方法采用了一个深度生成模型，该模型最初使用有限的已知药物活性和由第二个机器学习贝叶斯回归模型提供的近似对接分数的组合进行训练，并通过完整的对接模拟对高分化合物进行最终评估。这种策略的结果是，与类似大小的分子相比，具有对接分数的分子提高了10-20%，同时比典型GPU工作站上仅对接的方法快130倍。我们还发现，对接分数的增加与(1)与已知抑制剂的相互作用相似的对接姿态和(2)导致与已知DDR1抑制剂的能量相当的更高的MM-GBSA结合能相关，这表明贝叶斯模型包含足够的信息，使网络在强化学习过程中学习有效地与结合口袋相互作用。这一结果表明，将学习到的潜在分子表示与基于特征的对接回归相结合，足以用于强化学习来推断分子与受体结合位点之间的关系，这表明我们的方法可以成为发现具有潜在治疗应用的新化学型的有力工具。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Computer-Aided Molecular Design 生物-计算机：跨学科应用

CiteScore

8.00

自引率

8.60%

发文量

审稿时长

3 months

期刊介绍： The Journal of Computer-Aided Molecular Design provides a form for disseminating information on both the theory and the application of computer-based methods in the analysis and design of molecules. The scope of the journal encompasses papers which report new and original research and applications in the following areas: - theoretical chemistry; - computational chemistry; - computer and molecular graphics; - molecular modeling; - protein engineering; - drug design; - expert systems; - general structure-property relationships; - molecular dynamics; - chemical database development and usage.