Normalized Protein–Ligand Distance Likelihood Score for End-to-End Blind Docking and Virtual Screening

IF 5.3 2区化学 Q1 CHEMISTRY, MEDICINAL

Journal of Chemical Information and Modeling Pub Date : 2025-01-17 DOI:10.1021/acs.jcim.4c0101410.1021/acs.jcim.4c01014

Song Xia, Yaowen Gu and Yingkai Zhang*,

{"title":"Normalized Protein–Ligand Distance Likelihood Score for End-to-End Blind Docking and Virtual Screening","authors":"Song Xia, Yaowen Gu and Yingkai Zhang*, ","doi":"10.1021/acs.jcim.4c0101410.1021/acs.jcim.4c01014","DOIUrl":null,"url":null,"abstract":"<p >Molecular Docking is a critical task in structure-based virtual screening. Recent advancements have showcased the efficacy of diffusion-based generative models for blind docking tasks. However, these models do not inherently estimate protein–ligand binding strength thus cannot be directly applied to virtual screening tasks. Protein–ligand scoring functions serve as fast and approximate computational methods to evaluate the binding strength between the protein and ligand. In this work, we introduce normalized mixture density network (NMDN) score, a deep learning (DL)-based scoring function learning the probability density distribution of distances between protein residues and ligand atoms. The NMDN score addresses limitations observed in existing DL scoring functions and performs robustly in both pose selection and virtual screening tasks. Additionally, we incorporate an interaction module to predict the experimental binding affinity score to fully utilize the learned protein and ligand representations. Finally, we present an end-to-end blind docking and virtual screening protocol named DiffDock-NMDN. For each protein–ligand pair, we employ DiffDock to sample multiple poses, followed by utilizing the NMDN score to select the optimal binding pose, and estimating the binding affinity using scoring functions. Our protocol achieves an average enrichment factor of 4.96 on the LIT-PCBA data set, proving effective in real-world drug discovery scenarios where binder information is limited. This work not only presents a robust DL-based scoring function with superior pose selection and virtual screening capabilities but also offers a blind docking protocol and benchmarks to guide future scoring function development.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"65 3","pages":"1101–1114 1101–1114"},"PeriodicalIF":5.3000,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.acs.org/doi/epdf/10.1021/acs.jcim.4c01014","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Chemical Information and Modeling ","FirstCategoryId":"92","ListUrlMain":"https://pubs.acs.org/doi/10.1021/acs.jcim.4c01014","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MEDICINAL","Score":null,"Total":0}

引用次数: 0

Abstract

Molecular Docking is a critical task in structure-based virtual screening. Recent advancements have showcased the efficacy of diffusion-based generative models for blind docking tasks. However, these models do not inherently estimate protein–ligand binding strength thus cannot be directly applied to virtual screening tasks. Protein–ligand scoring functions serve as fast and approximate computational methods to evaluate the binding strength between the protein and ligand. In this work, we introduce normalized mixture density network (NMDN) score, a deep learning (DL)-based scoring function learning the probability density distribution of distances between protein residues and ligand atoms. The NMDN score addresses limitations observed in existing DL scoring functions and performs robustly in both pose selection and virtual screening tasks. Additionally, we incorporate an interaction module to predict the experimental binding affinity score to fully utilize the learned protein and ligand representations. Finally, we present an end-to-end blind docking and virtual screening protocol named DiffDock-NMDN. For each protein–ligand pair, we employ DiffDock to sample multiple poses, followed by utilizing the NMDN score to select the optimal binding pose, and estimating the binding affinity using scoring functions. Our protocol achieves an average enrichment factor of 4.96 on the LIT-PCBA data set, proving effective in real-world drug discovery scenarios where binder information is limited. This work not only presents a robust DL-based scoring function with superior pose selection and virtual screening capabilities but also offers a blind docking protocol and benchmarks to guide future scoring function development.

查看原文本刊更多论文

端到端盲对接和虚拟筛选的归一化蛋白配体距离似然评分

分子对接是基于结构的虚拟筛选中的一项关键任务。最近的进展已经证明了基于扩散的生成模型在盲对接任务中的有效性。然而，这些模型并不固有地估计蛋白质与配体的结合强度，因此不能直接应用于虚拟筛选任务。蛋白质-配体评分函数作为一种快速、近似的计算方法来评估蛋白质与配体之间的结合强度。在这项工作中，我们引入了归一化混合密度网络（NMDN）评分，这是一种基于深度学习（DL）的评分函数，用于学习蛋白质残基和配体原子之间距离的概率密度分布。NMDN评分解决了在现有DL评分功能中观察到的局限性，并在姿势选择和虚拟筛选任务中表现稳健。此外，我们结合了一个相互作用模块来预测实验结合亲和力评分，以充分利用学习到的蛋白质和配体表征。最后，我们提出了一个端到端盲对接和虚拟筛选协议DiffDock-NMDN。对于每个蛋白质配体对，我们使用DiffDock对多个姿态进行采样，然后利用NMDN评分选择最佳结合姿态，并使用评分函数估计结合亲和力。我们的方案在LIT-PCBA数据集上实现了4.96的平均富集因子，证明在粘合剂信息有限的现实药物发现场景中是有效的。这项工作不仅提供了一个强大的基于dl的评分功能，具有优越的姿势选择和虚拟筛选功能，而且还提供了一个盲对接协议和基准，以指导未来评分功能的开发。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Chemical Information and Modeling 化学-化学综合

CiteScore

9.80

自引率

10.70%

发文量

529

审稿时长

1.4 months

期刊介绍： The Journal of Chemical Information and Modeling publishes papers reporting new methodology and/or important applications in the fields of chemical informatics and molecular modeling. Specific topics include the representation and computer-based searching of chemical databases, molecular modeling, computer-aided molecular design of new materials, catalysts, or ligands, development of new computational methods or efficient algorithms for chemical software, and biopharmaceutical chemistry including analyses of biological activity and other issues related to drug discovery. Astute chemists, computer scientists, and information specialists look to this monthly’s insightful research studies, programming innovations, and software reviews to keep current with advances in this integral, multidisciplinary field. As a subscriber you’ll stay abreast of database search systems, use of graph theory in chemical problems, substructure search systems, pattern recognition and clustering, analysis of chemical and physical data, molecular modeling, graphics and natural language interfaces, bibliometric and citation analysis, and synthesis design and reactions databases.