基于新型相互作用的图神经网络框架的蛋白质-配体结合亲和力预测

IF 4.3 Q2 BIOCHEMISTRY & MOLECULAR BIOLOGY

ACS Bio & Med Chem Au Pub Date : 2025-04-29 DOI:10.1021/acsbiomedchemau.5c0005310.1021/acsbiomedchemau.5c00053

Madhav V. Samudrala, Somanath Dandibhotla, Arjun Kaneriya and Sivanesan Dakshanamurthy*,

{"title":"基于新型相互作用的图神经网络框架的蛋白质-配体结合亲和力预测","authors":"Madhav V. Samudrala, Somanath Dandibhotla, Arjun Kaneriya and Sivanesan Dakshanamurthy*, ","doi":"10.1021/acsbiomedchemau.5c0005310.1021/acsbiomedchemau.5c00053","DOIUrl":null,"url":null,"abstract":"<p >Rapid prediction of protein–ligand binding affinity is important in the drug discovery process. The advent of machine learning methods has increased the speed of these predictions. Previous machine learning models based on structural, sequence, and interaction-based approaches have shown potential but often tend to memorize training data due to incomplete feature representations that lead to poor generalization on external complexes. To address this challenge, here, we developed PLAIG, a Graph Neural Network (GNN)-based machine learning framework for generalized binding affinity prediction. PLAIG represents binding complexes as graphs, integrating protein–ligand interactions and molecular topology to uniquely capture interaction and structural features. To reduce overfitting, we tested principal component analysis (PCA) and ensemble learning with a stacking regressor. During benchmarking, PLAIG achieved a PCC of 0.78 on 4852 complexes from the PDBbind v.2019 refined set and 0.82 on 285 complexes from the v.2016 core set, outperforming many existing models. External validation on the DUDE-Z data set demonstrated its ability to differentiate active ligands from decoys, achieving an average AUC of 0.69 and a maximum AUC of 0.89. To enrich de novo prediction capabilities for subsequent model versions, PLAIG was hybridized with sequence- and structure-based models. The hybrid models achieved an average PCC of 0.88 on well-known drug–target complexes, with the best reaching a PCC of 0.98. Future work will incorporate an explicit inclusion of a docking methodology into PLAIG’s pipeline and assess its performance on de novo ligands. PLAIG is freely available at https://plaig-demo.streamlit.app/.</p>","PeriodicalId":29802,"journal":{"name":"ACS Bio & Med Chem Au","volume":"5 3","pages":"447–463 447–463"},"PeriodicalIF":4.3000,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.acs.org/doi/epdf/10.1021/acsbiomedchemau.5c00053","citationCount":"0","resultStr":"{\"title\":\"PLAIG: Protein–Ligand Binding Affinity Prediction Using a Novel Interaction-Based Graph Neural Network Framework\",\"authors\":\"Madhav V. Samudrala, Somanath Dandibhotla, Arjun Kaneriya and Sivanesan Dakshanamurthy*, \",\"doi\":\"10.1021/acsbiomedchemau.5c0005310.1021/acsbiomedchemau.5c00053\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p >Rapid prediction of protein–ligand binding affinity is important in the drug discovery process. The advent of machine learning methods has increased the speed of these predictions. Previous machine learning models based on structural, sequence, and interaction-based approaches have shown potential but often tend to memorize training data due to incomplete feature representations that lead to poor generalization on external complexes. To address this challenge, here, we developed PLAIG, a Graph Neural Network (GNN)-based machine learning framework for generalized binding affinity prediction. PLAIG represents binding complexes as graphs, integrating protein–ligand interactions and molecular topology to uniquely capture interaction and structural features. To reduce overfitting, we tested principal component analysis (PCA) and ensemble learning with a stacking regressor. During benchmarking, PLAIG achieved a PCC of 0.78 on 4852 complexes from the PDBbind v.2019 refined set and 0.82 on 285 complexes from the v.2016 core set, outperforming many existing models. External validation on the DUDE-Z data set demonstrated its ability to differentiate active ligands from decoys, achieving an average AUC of 0.69 and a maximum AUC of 0.89. To enrich de novo prediction capabilities for subsequent model versions, PLAIG was hybridized with sequence- and structure-based models. The hybrid models achieved an average PCC of 0.88 on well-known drug–target complexes, with the best reaching a PCC of 0.98. Future work will incorporate an explicit inclusion of a docking methodology into PLAIG’s pipeline and assess its performance on de novo ligands. PLAIG is freely available at https://plaig-demo.streamlit.app/.</p>\",\"PeriodicalId\":29802,\"journal\":{\"name\":\"ACS Bio & Med Chem Au\",\"volume\":\"5 3\",\"pages\":\"447–463 447–463\"},\"PeriodicalIF\":4.3000,\"publicationDate\":\"2025-04-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://pubs.acs.org/doi/epdf/10.1021/acsbiomedchemau.5c00053\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACS Bio & Med Chem Au\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://pubs.acs.org/doi/10.1021/acsbiomedchemau.5c00053\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Bio & Med Chem Au","FirstCategoryId":"1085","ListUrlMain":"https://pubs.acs.org/doi/10.1021/acsbiomedchemau.5c00053","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

快速预测蛋白质与配体结合的亲和力在药物发现过程中非常重要。机器学习方法的出现提高了这些预测的速度。以前基于结构、序列和基于交互的方法的机器学习模型已经显示出潜力，但由于不完整的特征表示，往往倾向于记忆训练数据，从而导致对外部复合体的不良泛化。为了应对这一挑战，我们开发了PLAIG，这是一种基于图神经网络（GNN）的机器学习框架，用于广义绑定亲和预测。PLAIG将结合复合物表示为图形，整合了蛋白质-配体相互作用和分子拓扑结构，以独特地捕捉相互作用和结构特征。为了减少过拟合，我们用堆叠回归量测试了主成分分析（PCA）和集成学习。在基准测试中，PLAIG在pdbind v.2019精细化集的4852个配合物上实现了0.78的PCC，在v.2016核心集的285个配合物上实现了0.82的PCC，优于许多现有模型。对DUDE-Z数据集的外部验证表明，它能够区分活性配体和诱饵，平均AUC为0.69，最大AUC为0.89。为了丰富后续模型版本的从头预测能力，PLAIG与基于序列和结构的模型杂交。混合模型对已知药物靶标复合物的平均PCC为0.88，最佳PCC为0.98。未来的工作将包括将对接方法明确纳入PLAIG的管道，并评估其在新配体上的性能。PLAIG可以在https://plaig-demo.streamlit.app/免费获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

PLAIG: Protein–Ligand Binding Affinity Prediction Using a Novel Interaction-Based Graph Neural Network Framework

Rapid prediction of protein–ligand binding affinity is important in the drug discovery process. The advent of machine learning methods has increased the speed of these predictions. Previous machine learning models based on structural, sequence, and interaction-based approaches have shown potential but often tend to memorize training data due to incomplete feature representations that lead to poor generalization on external complexes. To address this challenge, here, we developed PLAIG, a Graph Neural Network (GNN)-based machine learning framework for generalized binding affinity prediction. PLAIG represents binding complexes as graphs, integrating protein–ligand interactions and molecular topology to uniquely capture interaction and structural features. To reduce overfitting, we tested principal component analysis (PCA) and ensemble learning with a stacking regressor. During benchmarking, PLAIG achieved a PCC of 0.78 on 4852 complexes from the PDBbind v.2019 refined set and 0.82 on 285 complexes from the v.2016 core set, outperforming many existing models. External validation on the DUDE-Z data set demonstrated its ability to differentiate active ligands from decoys, achieving an average AUC of 0.69 and a maximum AUC of 0.89. To enrich de novo prediction capabilities for subsequent model versions, PLAIG was hybridized with sequence- and structure-based models. The hybrid models achieved an average PCC of 0.88 on well-known drug–target complexes, with the best reaching a PCC of 0.98. Future work will incorporate an explicit inclusion of a docking methodology into PLAIG’s pipeline and assess its performance on de novo ligands. PLAIG is freely available at https://plaig-demo.streamlit.app/.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACS Bio & Med Chem Au 药物、生物、化学-

CiteScore

4.10

自引率

0.00%

发文量

期刊介绍： ACS Bio & Med Chem Au is a broad scope open access journal which publishes short letters comprehensive articles reviews and perspectives in all aspects of biological and medicinal chemistry. Studies providing fundamental insights or describing novel syntheses as well as clinical or other applications-based work are welcomed.This broad scope includes experimental and theoretical studies on the chemical physical mechanistic and/or structural basis of biological or cell function in all domains of life. It encompasses the fields of chemical biology synthetic biology disease biology cell biology agriculture and food natural products research nucleic acid biology neuroscience structural biology and biophysics.The journal publishes studies that pertain to a broad range of medicinal chemistry including compound design and optimization biological evaluation molecular mechanistic understanding of drug delivery and drug delivery systems imaging agents and pharmacology and translational science of both small and large bioactive molecules. Novel computational cheminformatics and structural studies for the identification (or structure-activity relationship analysis) of bioactive molecules ligands and their targets are also welcome. The journal will consider computational studies applying established computational methods but only in combination with novel and original experimental data (e.g. in cases where new compounds have been designed and tested).Also included in the scope of the journal are articles relating to infectious diseases research on pathogens host-pathogen interactions therapeutics diagnostics vaccines drug-delivery systems and other biomedical technology development pertaining to infectious diseases.