利用语言模型对基于结构的药物设计进行分子片段和几何感知标记化。

ArXiv Pub Date : 2024-08-19
Cong Fu, Xiner Li, Blake Olson, Heng Ji, Shuiwang Ji
{"title":"利用语言模型对基于结构的药物设计进行分子片段和几何感知标记化。","authors":"Cong Fu, Xiner Li, Blake Olson, Heng Ji, Shuiwang Ji","doi":"","DOIUrl":null,"url":null,"abstract":"<p><p>Structure-based drug design (SBDD) is crucial for developing specific and effective therapeutics against protein targets but remains challenging due to complex protein-ligand interactions and vast chemical space. Although language models (LMs) have excelled in natural language processing, their application in SBDD is underexplored. To bridge this gap, we introduce a method, known as Frag2Seq, to apply LMs to SBDD by generating molecules in a fragment-based manner in which fragments correspond to functional modules. We transform 3D molecules into fragment-informed sequences using SE(3)-equivariant molecule and fragment local frames, extracting SE(3)-invariant sequences that preserve geometric information of 3D fragments. Furthermore, we incorporate protein pocket embeddings obtained from a pre-trained inverse folding model into the LMs via cross-attention to capture protein-ligand interaction, enabling effective target-aware molecule generation. Benefiting from employing LMs with fragment-based generation and effective protein context encoding, our model achieves the best performance on binding vina score and chemical properties such as QED and Lipinski, which shows our model's efficacy in generating drug-like ligands with higher binding affinity against target proteins. Moreover, our method also exhibits higher sampling efficiency compared to atom-based autoregressive and diffusion baselines with at most ~300x speedup.</p>","PeriodicalId":93888,"journal":{"name":"ArXiv","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11383437/pdf/","citationCount":"0","resultStr":"{\"title\":\"Fragment and Geometry Aware Tokenization of Molecules for Structure-Based Drug Design Using Language Models.\",\"authors\":\"Cong Fu, Xiner Li, Blake Olson, Heng Ji, Shuiwang Ji\",\"doi\":\"\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Structure-based drug design (SBDD) is crucial for developing specific and effective therapeutics against protein targets but remains challenging due to complex protein-ligand interactions and vast chemical space. Although language models (LMs) have excelled in natural language processing, their application in SBDD is underexplored. To bridge this gap, we introduce a method, known as Frag2Seq, to apply LMs to SBDD by generating molecules in a fragment-based manner in which fragments correspond to functional modules. We transform 3D molecules into fragment-informed sequences using SE(3)-equivariant molecule and fragment local frames, extracting SE(3)-invariant sequences that preserve geometric information of 3D fragments. Furthermore, we incorporate protein pocket embeddings obtained from a pre-trained inverse folding model into the LMs via cross-attention to capture protein-ligand interaction, enabling effective target-aware molecule generation. Benefiting from employing LMs with fragment-based generation and effective protein context encoding, our model achieves the best performance on binding vina score and chemical properties such as QED and Lipinski, which shows our model's efficacy in generating drug-like ligands with higher binding affinity against target proteins. Moreover, our method also exhibits higher sampling efficiency compared to atom-based autoregressive and diffusion baselines with at most ~300x speedup.</p>\",\"PeriodicalId\":93888,\"journal\":{\"name\":\"ArXiv\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11383437/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ArXiv\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ArXiv","FirstCategoryId":"1085","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

基于结构的药物设计(SBDD)对于开发针对蛋白质靶点的特异性有效疗法至关重要,但由于复杂的蛋白质配体相互作用和广阔的化学空间,SBDD 仍然充满挑战。尽管语言模型(LMs)在自然语言处理方面表现出色,但其在 SBDD 中的应用还未得到充分探索。为了弥补这一差距,我们引入了一种称为 Frag2Seq 的方法,通过基于片段生成分子的方式将语言模型应用于 SBDD,其中片段对应于功能模块。我们使用 SE(3)-equivariant 分子和片段局部框架将三维分子转化为片段信息序列,提取出保留三维片段几何信息的 SE(3)-invariant 序列。此外,我们还通过交叉关注将从预先训练的反折叠模型中获得的蛋白质口袋嵌入纳入 LMs,以捕捉蛋白质与配体之间的相互作用,从而实现有效的目标感知分子生成。得益于基于片段生成的 LMs 和有效的蛋白质上下文编码,我们的模型在结合 Vina 分数和化学特性(如 QED 和 Lipinski)方面取得了最佳性能,这表明我们的模型在生成与目标蛋白质具有更高结合亲和力的类药物配体方面非常有效。此外,与基于原子的自回归和扩散基线相比,我们的方法还具有更高的采样效率,速度最多可提高约 300 倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Fragment and Geometry Aware Tokenization of Molecules for Structure-Based Drug Design Using Language Models.

Structure-based drug design (SBDD) is crucial for developing specific and effective therapeutics against protein targets but remains challenging due to complex protein-ligand interactions and vast chemical space. Although language models (LMs) have excelled in natural language processing, their application in SBDD is underexplored. To bridge this gap, we introduce a method, known as Frag2Seq, to apply LMs to SBDD by generating molecules in a fragment-based manner in which fragments correspond to functional modules. We transform 3D molecules into fragment-informed sequences using SE(3)-equivariant molecule and fragment local frames, extracting SE(3)-invariant sequences that preserve geometric information of 3D fragments. Furthermore, we incorporate protein pocket embeddings obtained from a pre-trained inverse folding model into the LMs via cross-attention to capture protein-ligand interaction, enabling effective target-aware molecule generation. Benefiting from employing LMs with fragment-based generation and effective protein context encoding, our model achieves the best performance on binding vina score and chemical properties such as QED and Lipinski, which shows our model's efficacy in generating drug-like ligands with higher binding affinity against target proteins. Moreover, our method also exhibits higher sampling efficiency compared to atom-based autoregressive and diffusion baselines with at most ~300x speedup.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信