Accurate prediction of protein–ligand interactions by combining physical energy functions and graph-neural networks

IF 7.1 2区化学 Q1 CHEMISTRY, MULTIDISCIPLINARY

Journal of Cheminformatics Pub Date : 2024-11-04 DOI:10.1186/s13321-024-00912-2

Yiyu Hong, Junsu Ha, Jaemin Sim, Chae Jo Lim, Kwang-Seok Oh, Ramakrishnan Chandrasekaran, Bomin Kim, Jieun Choi, Junsu Ko, Woong-Hee Shin, Juyong Lee

{"title":"Accurate prediction of protein–ligand interactions by combining physical energy functions and graph-neural networks","authors":"Yiyu Hong, Junsu Ha, Jaemin Sim, Chae Jo Lim, Kwang-Seok Oh, Ramakrishnan Chandrasekaran, Bomin Kim, Jieun Choi, Junsu Ko, Woong-Hee Shin, Juyong Lee","doi":"10.1186/s13321-024-00912-2","DOIUrl":null,"url":null,"abstract":"<div><p>We introduce an advanced model for predicting protein–ligand interactions. Our approach combines the strengths of graph neural networks with physics-based scoring methods. Existing structure-based machine-learning models for protein–ligand binding prediction often fall short in practical virtual screening scenarios, hindered by the intricacies of binding poses, the chemical diversity of drug-like molecules, and the scarcity of crystallographic data for protein–ligand complexes. To overcome the limitations of existing machine learning-based prediction models, we propose a novel approach that fuses three independent neural network models. One classification model is designed to perform binary prediction of a given protein–ligand complex pose. The other two regression models are trained to predict the binding affinity and root-mean-square deviation of a ligand conformation from an input complex structure. We trained the model to account for both deviations in experimental and predicted binding affinities and pose prediction uncertainties. By effectively integrating the outputs of the triplet neural networks with a physics-based scoring function, our model showed a significantly improved performance in hit identification. The benchmark results with three independent decoy sets demonstrate that our model outperformed existing models in forward screening. Our model achieved top 1% enrichment factors of 32.7 and 23.1 with the CASF2016 and DUD-E benchmark sets, respectively. The benchmark results using the LIT-PCBA set further confirmed its higher average enrichment factors, emphasizing the model’s efficiency and generalizability. The model’s efficiency was further validated by identifying 23 active compounds from 63 candidates in experimental screening for autotaxin inhibitors, demonstrating its practical applicability in hit discovery.</p><p><b>Scientific contribution</b></p><p>Our work introduces a novel training strategy for a protein–ligand binding affinity prediction model by integrating the outputs of three independent sub-models and utilizing expertly crafted decoy sets. The model showcases exceptional performance across multiple benchmarks. The high enrichment factors in the LIT-PCBA benchmark demonstrate its potential to accelerate hit discovery.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"16 1","pages":""},"PeriodicalIF":7.1000,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-024-00912-2","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Cheminformatics","FirstCategoryId":"92","ListUrlMain":"https://link.springer.com/article/10.1186/s13321-024-00912-2","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

Abstract

We introduce an advanced model for predicting protein–ligand interactions. Our approach combines the strengths of graph neural networks with physics-based scoring methods. Existing structure-based machine-learning models for protein–ligand binding prediction often fall short in practical virtual screening scenarios, hindered by the intricacies of binding poses, the chemical diversity of drug-like molecules, and the scarcity of crystallographic data for protein–ligand complexes. To overcome the limitations of existing machine learning-based prediction models, we propose a novel approach that fuses three independent neural network models. One classification model is designed to perform binary prediction of a given protein–ligand complex pose. The other two regression models are trained to predict the binding affinity and root-mean-square deviation of a ligand conformation from an input complex structure. We trained the model to account for both deviations in experimental and predicted binding affinities and pose prediction uncertainties. By effectively integrating the outputs of the triplet neural networks with a physics-based scoring function, our model showed a significantly improved performance in hit identification. The benchmark results with three independent decoy sets demonstrate that our model outperformed existing models in forward screening. Our model achieved top 1% enrichment factors of 32.7 and 23.1 with the CASF2016 and DUD-E benchmark sets, respectively. The benchmark results using the LIT-PCBA set further confirmed its higher average enrichment factors, emphasizing the model’s efficiency and generalizability. The model’s efficiency was further validated by identifying 23 active compounds from 63 candidates in experimental screening for autotaxin inhibitors, demonstrating its practical applicability in hit discovery.

Scientific contribution

Our work introduces a novel training strategy for a protein–ligand binding affinity prediction model by integrating the outputs of three independent sub-models and utilizing expertly crafted decoy sets. The model showcases exceptional performance across multiple benchmarks. The high enrichment factors in the LIT-PCBA benchmark demonstrate its potential to accelerate hit discovery.

查看原文本刊更多论文

结合物理能量函数和图神经网络，准确预测蛋白质配体之间的相互作用。

我们介绍了一种用于预测蛋白质配体相互作用的先进模型。我们的方法结合了图神经网络和基于物理的评分方法的优势。现有的基于结构的蛋白质配体结合预测机器学习模型在实际的虚拟筛选场景中往往不尽如人意，这是因为结合位置错综复杂、类药物分子的化学多样性以及蛋白质配体复合物晶体学数据的稀缺性所造成的。为了克服现有基于机器学习的预测模型的局限性，我们提出了一种融合三个独立神经网络模型的新方法。其中一个分类模型旨在对给定的蛋白质配体复合体姿态进行二元预测。另外两个回归模型则用于预测配体构象与输入复合物结构的结合亲和力和均方根偏差。我们对模型进行了训练，以考虑实验结合亲和力和预测结合亲和力的偏差以及姿势预测的不确定性。通过将三重神经网络的输出与基于物理学的评分函数有效整合，我们的模型在命中识别方面的性能有了显著提高。三个独立诱饵集的基准结果表明，我们的模型在前向筛选中的表现优于现有模型。我们的模型在 CASF2016 和 DUD-E 基准集上的前 1%富集因子分别达到了 32.7 和 23.1。使用 LIT-PCBA 集的基准结果进一步证实了该模型具有更高的平均富集因子，从而强调了该模型的效率和普适性。在自体表皮生长因子抑制剂的实验筛选中，我们从 63 个候选化合物中鉴定出 23 个活性化合物，进一步验证了该模型的效率，证明了它在发现新药方面的实用性。科学贡献我们的工作通过整合三个独立子模型的输出结果并利用专家制作的诱饵集，为蛋白质配体结合亲和力预测模型引入了一种新的训练策略。该模型在多个基准测试中表现出卓越的性能。LIT-PCBA 基准中的高富集因子证明了它在加速发现新发现方面的潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Cheminformatics CHEMISTRY, MULTIDISCIPLINARY-COMPUTER SCIENCE, INFORMATION SYSTEMS

CiteScore

14.10

自引率

7.00%

发文量

审稿时长

3 months

期刊介绍： Journal of Cheminformatics is an open access journal publishing original peer-reviewed research in all aspects of cheminformatics and molecular modelling. Coverage includes, but is not limited to: chemical information systems, software and databases, and molecular modelling, chemical structure representations and their use in structure, substructure, and similarity searching of chemical substance and chemical reaction databases, computer and molecular graphics, computer-aided molecular design, expert systems, QSAR, and data mining techniques.