Generative Model for Small Molecules with Latent Space RL Fine-Tuning to Protein Targets

Ulrich A. Mbou Sob, Qiulin Li, Miguel Arbesú, Oliver Bent, Andries P. Smit, Arnu Pretorius
{"title":"Generative Model for Small Molecules with Latent Space RL Fine-Tuning to Protein Targets","authors":"Ulrich A. Mbou Sob, Qiulin Li, Miguel Arbesú, Oliver Bent, Andries P. Smit, Arnu Pretorius","doi":"arxiv-2407.13780","DOIUrl":null,"url":null,"abstract":"A specific challenge with deep learning approaches for molecule generation is\ngenerating both syntactically valid and chemically plausible molecular string\nrepresentations. To address this, we propose a novel generative latent-variable\ntransformer model for small molecules that leverages a recently proposed\nmolecular string representation called SAFE. We introduce a modification to\nSAFE to reduce the number of invalid fragmented molecules generated during\ntraining and use this to train our model. Our experiments show that our model\ncan generate novel molecules with a validity rate > 90% and a fragmentation\nrate < 1% by sampling from a latent space. By fine-tuning the model using\nreinforcement learning to improve molecular docking, we significantly increase\nthe number of hit candidates for five specific protein targets compared to the\npre-trained model, nearly doubling this number for certain targets.\nAdditionally, our top 5% mean docking scores are comparable to the current\nstate-of-the-art (SOTA), and we marginally outperform SOTA on three of the five\ntargets.","PeriodicalId":501022,"journal":{"name":"arXiv - QuanBio - Biomolecules","volume":"19 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuanBio - Biomolecules","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2407.13780","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

A specific challenge with deep learning approaches for molecule generation is generating both syntactically valid and chemically plausible molecular string representations. To address this, we propose a novel generative latent-variable transformer model for small molecules that leverages a recently proposed molecular string representation called SAFE. We introduce a modification to SAFE to reduce the number of invalid fragmented molecules generated during training and use this to train our model. Our experiments show that our model can generate novel molecules with a validity rate > 90% and a fragmentation rate < 1% by sampling from a latent space. By fine-tuning the model using reinforcement learning to improve molecular docking, we significantly increase the number of hit candidates for five specific protein targets compared to the pre-trained model, nearly doubling this number for certain targets. Additionally, our top 5% mean docking scores are comparable to the current state-of-the-art (SOTA), and we marginally outperform SOTA on three of the five targets.
利用潜空间 RL 微调蛋白质靶点的小分子生成模型
用于分子生成的深度学习方法面临的一个具体挑战是生成语法上有效、化学上可信的分子字符串表示。为了解决这个问题,我们提出了一种新颖的小分子生成潜变量变换器模型,该模型利用了最近提出的名为 SAFE 的分子串表示法。我们对 SAFE 进行了修改,以减少训练过程中产生的无效片段分子的数量,并以此来训练我们的模型。我们的实验表明,通过从潜在空间采样,我们的模型可以生成有效率大于 90%、破碎率小于 1%的新分子。通过使用强化学习对模型进行微调以改进分子对接,与预先训练的模型相比,我们显著增加了五个特定蛋白质靶标的命中候选者数量,某些靶标的命中候选者数量几乎翻了一番。此外,我们的前 5%平均对接得分与当前最先进的模型(SOTA)相当,在五个靶标中的三个靶标上,我们的得分略高于 SOTA。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信