Harnessing pre-trained models for accurate prediction of protein-ligand binding affinity.

IF 2.9 3区生物学 Q2 BIOCHEMICAL RESEARCH METHODS

BMC Bioinformatics Pub Date : 2025-02-17 DOI:10.1186/s12859-025-06064-w

Jiashan Li, Xinqi Gong

{"title":"Harnessing pre-trained models for accurate prediction of protein-ligand binding affinity.","authors":"Jiashan Li, Xinqi Gong","doi":"10.1186/s12859-025-06064-w","DOIUrl":null,"url":null,"abstract":"Background: The binding between proteins and ligands plays a crucial role in the field of drug discovery. However, this area currently faces numerous challenges. On one hand, existing methods are constrained by the limited availability of labeled data, often performing inadequately when addressing complex protein-ligand interactions. On the other hand, many models struggle to effectively capture the flexible variations and relative spatial relationships between proteins and ligands. These issues not only significantly hinder the advancement of protein-ligand binding research but also adversely affect the accuracy and efficiency of drug discovery. Therefore, in response to these challenges, our study aims to enhance predictive capabilities through innovative approaches, providing more reliable support for drug discovery efforts.Methods: This study leverages a pre-trained model with spatial awareness to enhance the prediction of protein-ligand binding affinity. By perturbing the structures of small molecules in a manner consistent with physical constraints and employing self-supervised tasks, we improve the representation of small molecule structures, allowing for better adaptation to affinity predictions. Meanwhile, our approach enables the identification of potential binding sites on proteins.Results: Our model demonstrates a significantly higher correlation coefficient in binding affinity predictions. Extensive evaluation on the PDBBind v2019 refined set, CASF, and Merck FEP benchmarks confirms the model's robustness and strong generalization across diverse datasets. Additionally, the model achieves over 95% in classification ROC for binding site identification, underscoring its high accuracy in pinpointing protein-ligand interaction regions.Conclusion: This research presents a novel approach that not only enhances the accuracy of binding affinity predictions but also facilitates the identification of binding sites, showcasing the potential of pre-trained models in computational drug design. Data and code are available at https://github.com/MIALAB-RUC/SableBind .","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"55"},"PeriodicalIF":2.9000,"publicationDate":"2025-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11834573/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s12859-025-06064-w","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Background: The binding between proteins and ligands plays a crucial role in the field of drug discovery. However, this area currently faces numerous challenges. On one hand, existing methods are constrained by the limited availability of labeled data, often performing inadequately when addressing complex protein-ligand interactions. On the other hand, many models struggle to effectively capture the flexible variations and relative spatial relationships between proteins and ligands. These issues not only significantly hinder the advancement of protein-ligand binding research but also adversely affect the accuracy and efficiency of drug discovery. Therefore, in response to these challenges, our study aims to enhance predictive capabilities through innovative approaches, providing more reliable support for drug discovery efforts.

Methods: This study leverages a pre-trained model with spatial awareness to enhance the prediction of protein-ligand binding affinity. By perturbing the structures of small molecules in a manner consistent with physical constraints and employing self-supervised tasks, we improve the representation of small molecule structures, allowing for better adaptation to affinity predictions. Meanwhile, our approach enables the identification of potential binding sites on proteins.

Results: Our model demonstrates a significantly higher correlation coefficient in binding affinity predictions. Extensive evaluation on the PDBBind v2019 refined set, CASF, and Merck FEP benchmarks confirms the model's robustness and strong generalization across diverse datasets. Additionally, the model achieves over 95% in classification ROC for binding site identification, underscoring its high accuracy in pinpointing protein-ligand interaction regions.

Conclusion: This research presents a novel approach that not only enhances the accuracy of binding affinity predictions but also facilitates the identification of binding sites, showcasing the potential of pre-trained models in computational drug design. Data and code are available at https://github.com/MIALAB-RUC/SableBind .

查看原文本刊更多论文

利用预先训练的模型来准确预测蛋白质与配体的结合亲和力。

背景：蛋白质与配体之间的结合在药物发现领域起着至关重要的作用。然而，这一领域目前面临着许多挑战。一方面，现有的方法受到标记数据可用性有限的限制，在处理复杂的蛋白质-配体相互作用时往往表现不充分。另一方面，许多模型难以有效地捕捉蛋白质和配体之间的灵活变化和相对空间关系。这些问题不仅严重阻碍了蛋白质-配体结合研究的进展，而且对药物发现的准确性和效率产生不利影响。因此，为了应对这些挑战，我们的研究旨在通过创新方法增强预测能力，为药物发现工作提供更可靠的支持。方法：本研究利用预先训练的具有空间意识的模型来增强对蛋白质-配体结合亲和力的预测。通过以符合物理约束的方式扰动小分子的结构，并采用自监督任务，我们改进了小分子结构的表示，从而更好地适应亲和力预测。同时，我们的方法能够识别蛋白质上潜在的结合位点。结果：我们的模型在结合亲和力预测中显示了显著更高的相关系数。对PDBBind v2019精细化集、CASF和默克FEP基准的广泛评估证实了该模型在不同数据集上的稳健性和强泛化性。此外，该模型在结合位点识别方面的分类ROC达到95%以上，强调了其在确定蛋白质-配体相互作用区域方面的高准确性。结论：本研究提出了一种新的方法，不仅提高了结合亲和力预测的准确性，而且有助于结合位点的识别，展示了预训练模型在计算药物设计中的潜力。数据和代码可在https://github.com/MIALAB-RUC/SableBind上获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

BMC Bioinformatics 生物-生化研究方法

CiteScore

5.70

自引率

3.30%

发文量

506

审稿时长

4.3 months

期刊介绍： BMC Bioinformatics is an open access, peer-reviewed journal that considers articles on all aspects of the development, testing and novel application of computational and statistical methods for the modeling and analysis of all kinds of biological data, as well as other areas of computational biology. BMC Bioinformatics is part of the BMC series which publishes subject-specific journals focused on the needs of individual research communities across all areas of biology and medicine. We offer an efficient, fair and friendly peer review service, and are committed to publishing all sound science, provided that there is some advance in knowledge presented by the work.