Comparative evaluation of methods for the prediction of protein–ligand binding sites

IF 7.1 2区化学 Q1 CHEMISTRY, MULTIDISCIPLINARY

Journal of Cheminformatics Pub Date : 2024-11-11 DOI:10.1186/s13321-024-00923-z

Javier S. Utgés, Geoffrey J. Barton

{"title":"Comparative evaluation of methods for the prediction of protein–ligand binding sites","authors":"Javier S. Utgés, Geoffrey J. Barton","doi":"10.1186/s13321-024-00923-z","DOIUrl":null,"url":null,"abstract":"<div>The accurate identification of protein–ligand binding sites is of critical importance in understanding and modulating protein function. Accordingly, ligand binding site prediction has remained a research focus for over three decades with over 50 methods developed and a change of paradigm from geometry-based to machine learning. In this work, we collate 13 ligand binding site predictors, spanning 30 years, focusing on the latest machine learning-based methods such as VN-EGNN, IF-SitePred, GrASP, PUResNet, and DeepPocket and compare them to the established P2Rank, PRANK and fpocket and earlier methods like PocketFinder, Ligsite and Surfnet. We benchmark the methods against the human subset of our new curated reference dataset, LIGYSIS. LIGYSIS is a comprehensive protein–ligand complex dataset comprising 30,000 proteins with bound ligands which aggregates biologically relevant unique protein–ligand interfaces across biological units of multiple structures from the same protein. LIGYSIS is an improvement for testing methods over earlier datasets like sc-PDB, PDBbind, binding MOAD, COACH420 and HOLO4K which either include 1:1 protein–ligand complexes or consider asymmetric units. Re-scoring of fpocket predictions by PRANK and DeepPocket display the highest recall (60%) whilst IF-SitePred presents the lowest recall (39%). We demonstrate the detrimental effect that redundant prediction of binding sites has on performance as well as the beneficial impact of stronger pocket scoring schemes, with improvements up to 14% in recall (IF-SitePred) and 30% in precision (Surfnet). Finally, we propose top-N+2 recall as the universal benchmark metric for ligand binding site prediction and urge authors to share not only the source code of their methods, but also of their benchmark.Scientific contributionsThis study conducts the largest benchmark of ligand binding site prediction methods to date, comparing 13 original methods and 15 variants using 10 informative metrics. The LIGYSIS dataset is introduced, which aggregates biologically relevant protein–ligand interfaces across multiple structures of the same protein. The study highlights the detrimental effect of redundant binding site prediction and demonstrates significant improvement in recall and precision through stronger scoring schemes. Finally, top-N+2 recall is proposed as a universal benchmark metric for ligand binding site prediction, with a recommendation for open-source sharing of both methods and benchmarks.</div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"16 1","pages":""},"PeriodicalIF":7.1000,"publicationDate":"2024-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-024-00923-z","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Cheminformatics","FirstCategoryId":"92","ListUrlMain":"https://link.springer.com/article/10.1186/s13321-024-00923-z","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

Abstract

The accurate identification of protein–ligand binding sites is of critical importance in understanding and modulating protein function. Accordingly, ligand binding site prediction has remained a research focus for over three decades with over 50 methods developed and a change of paradigm from geometry-based to machine learning. In this work, we collate 13 ligand binding site predictors, spanning 30 years, focusing on the latest machine learning-based methods such as VN-EGNN, IF-SitePred, GrASP, PUResNet, and DeepPocket and compare them to the established P2Rank, PRANK and fpocket and earlier methods like PocketFinder, Ligsite and Surfnet. We benchmark the methods against the human subset of our new curated reference dataset, LIGYSIS. LIGYSIS is a comprehensive protein–ligand complex dataset comprising 30,000 proteins with bound ligands which aggregates biologically relevant unique protein–ligand interfaces across biological units of multiple structures from the same protein. LIGYSIS is an improvement for testing methods over earlier datasets like sc-PDB, PDBbind, binding MOAD, COACH420 and HOLO4K which either include 1:1 protein–ligand complexes or consider asymmetric units. Re-scoring of fpocket predictions by PRANK and DeepPocket display the highest recall (60%) whilst IF-SitePred presents the lowest recall (39%). We demonstrate the detrimental effect that redundant prediction of binding sites has on performance as well as the beneficial impact of stronger pocket scoring schemes, with improvements up to 14% in recall (IF-SitePred) and 30% in precision (Surfnet). Finally, we propose top-N+2 recall as the universal benchmark metric for ligand binding site prediction and urge authors to share not only the source code of their methods, but also of their benchmark.

Scientific contributions

This study conducts the largest benchmark of ligand binding site prediction methods to date, comparing 13 original methods and 15 variants using 10 informative metrics. The LIGYSIS dataset is introduced, which aggregates biologically relevant protein–ligand interfaces across multiple structures of the same protein. The study highlights the detrimental effect of redundant binding site prediction and demonstrates significant improvement in recall and precision through stronger scoring schemes. Finally, top-N+2 recall is proposed as a universal benchmark metric for ligand binding site prediction, with a recommendation for open-source sharing of both methods and benchmarks.

查看原文本刊更多论文

蛋白质配体结合位点预测方法的比较评估

准确识别蛋白质配体结合位点对于理解和调节蛋白质功能至关重要。因此，配体结合位点预测三十多年来一直是研究重点，开发了 50 多种方法，研究范式也从基于几何的方法转变为机器学习方法。在这项工作中，我们整理了 13 种配体结合位点预测方法，时间跨度长达 30 年，重点关注 VN-EGNN、IF-SitePred、GrASP、PUResNet 和 DeepPocket 等基于机器学习的最新方法，并将它们与 P2Rank、PRANK 和 fpocket 等成熟方法以及 PocketFinder、Ligsite 和 Surfnet 等早期方法进行比较。我们以新的参考数据集 LIGYSIS 的人类子集为基准对这些方法进行了比较。LIGYSIS 是一个全面的蛋白质配体复合物数据集，包含 30,000 个与配体结合的蛋白质，汇集了同一蛋白质多个结构的生物单元中与生物相关的独特蛋白质配体界面。LIGYSIS 是对早期数据集（如 sc-PDB、PDBbind、结合 MOAD、COACH420 和 HOLO4K）测试方法的改进，这些数据集要么包含 1:1 蛋白质-配体复合物，要么考虑不对称单元。PRANK 和 DeepPocket 对 fpocket 预测的重新评分显示了最高的召回率（60%），而 IF-SitePred 则显示了最低的召回率（39%）。我们证明了多余的结合位点预测对性能的不利影响，以及更强的口袋评分方案的有利影响，召回率（IF-SitePred）和精确率（Surfnet）分别提高了 14% 和 30%。最后，我们建议将top-N+2召回率作为配体结合位点预测的通用基准指标，并敦促作者不仅要共享其方法的源代码，还要共享其基准指标。科学贡献本研究对配体结合位点预测方法进行了迄今为止最大规模的基准测试，使用 10 个信息指标对 13 种原始方法和 15 种变体进行了比较。研究引入了 LIGYSIS 数据集，该数据集汇总了同一蛋白质多个结构中与生物相关的蛋白质配体界面。研究强调了冗余结合位点预测的有害影响，并通过更强的评分方案证明了召回率和精确度的显著提高。最后，研究人员提出了top-N+2召回率作为配体结合位点预测的通用基准指标，并建议对方法和基准进行开源共享。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Cheminformatics CHEMISTRY, MULTIDISCIPLINARY-COMPUTER SCIENCE, INFORMATION SYSTEMS

CiteScore

14.10

自引率

7.00%

发文量

审稿时长

3 months

期刊介绍： Journal of Cheminformatics is an open access journal publishing original peer-reviewed research in all aspects of cheminformatics and molecular modelling. Coverage includes, but is not limited to: chemical information systems, software and databases, and molecular modelling, chemical structure representations and their use in structure, substructure, and similarity searching of chemical substance and chemical reaction databases, computer and molecular graphics, computer-aided molecular design, expert systems, QSAR, and data mining techniques.