Predicting Affinity Through Homology (PATH): Interpretable binding affinity prediction with persistent homology.

IF 3.6 2区生物学 Q1 BIOCHEMICAL RESEARCH METHODS

PLoS Computational Biology Pub Date : 2025-06-27 eCollection Date: 2025-06-01 DOI:10.1371/journal.pcbi.1013216

Yuxi Long, Bruce R Donald

{"title":"Predicting Affinity Through Homology (PATH): Interpretable binding affinity prediction with persistent homology.","authors":"Yuxi Long, Bruce R Donald","doi":"10.1371/journal.pcbi.1013216","DOIUrl":null,"url":null,"abstract":"<p><p>Accurate binding affinity prediction (BAP) is crucial to structure-based drug design. We present PATH+, a novel, generalizable machine learning algorithm for BAP that exploits recent advances in computational topology. Compared to current binding affinity prediction algorithms, PATH+ shows similar or better accuracy and is more generalizable across orthogonal datasets. PATH+ is not only one of the most accurate algorithms for BAP, it is also the first algorithm that is inherently interpretable. Interpretability is a key factor of trust for an algorithm and alongside generalizability, which allows PATH+ to be trusted in critical applications, such as inhibitor design. We visualized the features captured by PATH+ for two clinically relevant protein-ligand complexes and find that PATH+ captures binding-relevant structural mutations that are corroborated by biochemical data. Our work also sheds light on the features captured by current computational topology BAP algorithms that contributed to their high performance, which have been poorly understood. PATH+ also offers an improvement of 𝒪 (m + n)3 in computational complexity and is empirically over 10 times faster than the dominant (uninterpretable) computational topology algorithm for BAP. Based on insights from PATH+, we built PATH-, a scoring function for differentiating between binders and non-binders that has outstanding accuracy against 11 current algorithms for BAP. In summary, we report progress in a novel combination of interpretability, speed, and accuracy that should further empower topological screening of large virtual inhibitor libraries to protein targets, and allow binding affinity predictions to be understood and trusted. The source code for PATH+ and PATH- is released open-source as part of the OSPREY protein design software package.</p>","PeriodicalId":20241,"journal":{"name":"PLoS Computational Biology","volume":"21 6","pages":"e1013216"},"PeriodicalIF":3.6000,"publicationDate":"2025-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12226026/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PLoS Computational Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1371/journal.pcbi.1013216","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/6/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Accurate binding affinity prediction (BAP) is crucial to structure-based drug design. We present PATH+, a novel, generalizable machine learning algorithm for BAP that exploits recent advances in computational topology. Compared to current binding affinity prediction algorithms, PATH+ shows similar or better accuracy and is more generalizable across orthogonal datasets. PATH+ is not only one of the most accurate algorithms for BAP, it is also the first algorithm that is inherently interpretable. Interpretability is a key factor of trust for an algorithm and alongside generalizability, which allows PATH+ to be trusted in critical applications, such as inhibitor design. We visualized the features captured by PATH+ for two clinically relevant protein-ligand complexes and find that PATH+ captures binding-relevant structural mutations that are corroborated by biochemical data. Our work also sheds light on the features captured by current computational topology BAP algorithms that contributed to their high performance, which have been poorly understood. PATH+ also offers an improvement of 𝒪 (m + n)3 in computational complexity and is empirically over 10 times faster than the dominant (uninterpretable) computational topology algorithm for BAP. Based on insights from PATH+, we built PATH-, a scoring function for differentiating between binders and non-binders that has outstanding accuracy against 11 current algorithms for BAP. In summary, we report progress in a novel combination of interpretability, speed, and accuracy that should further empower topological screening of large virtual inhibitor libraries to protein targets, and allow binding affinity predictions to be understood and trusted. The source code for PATH+ and PATH- is released open-source as part of the OSPREY protein design software package.

查看原文本刊更多论文

通过同源性预测亲和力（PATH）：具有持久同源性的可解释结合亲和力预测。

准确的结合亲和力预测（BAP）对基于结构的药物设计至关重要。我们提出[公式：见文本]，一种新颖的，可推广的BAP机器学习算法，利用了计算拓扑的最新进展。与当前的绑定亲和预测算法相比，[公式：见文本]显示出相似或更好的准确性，并且在正交数据集上更具通用性。[公式：见文本]不仅是最精确的BAP算法之一，也是第一个具有内在可解释性的算法。可解释性是算法信任的一个关键因素，与通用性一起，允许[公式：见文本]在关键应用中被信任，例如抑制剂设计。我们将[公式：见文本]捕捉到的两种临床相关蛋白质配体复合物的特征可视化，发现[公式：见文本]捕捉到了生化数据证实的与结合相关的结构突变。我们的工作还揭示了当前计算拓扑BAP算法所捕获的特征，这些特征有助于它们的高性能，而人们对这些特征的理解很少。[公式：见文本]在计算复杂度方面也提供了[公式：见文本]的改进，并且经验上比BAP的主流（不可解释的）计算拓扑算法快10倍以上。基于[公式：见文]的见解，我们构建了[公式：见文]，这是一个用于区分粘合剂和非粘合剂的评分函数，与目前11种BAP算法相比，它具有出色的准确性。总之，我们报告了在可解释性、速度和准确性的新组合方面的进展，这将进一步增强对蛋白质靶点的大型虚拟抑制剂库的拓扑筛选，并允许结合亲和力预测被理解和信任。[公式：见文]和[公式：见文]的源代码作为鱼鹰蛋白设计软件包的一部分开源发布。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

PLoS Computational Biology BIOCHEMICAL RESEARCH METHODS-MATHEMATICAL & COMPUTATIONAL BIOLOGY

CiteScore

7.10

自引率

4.70%

发文量

820

审稿时长

2.5 months

期刊介绍： PLOS Computational Biology features works of exceptional significance that further our understanding of living systems at all scales—from molecules and cells, to patient populations and ecosystems—through the application of computational methods. Readers include life and computational scientists, who can take the important findings presented here to the next level of discovery. Research articles must be declared as belonging to a relevant section. More information about the sections can be found in the submission guidelines. Research articles should model aspects of biological systems, demonstrate both methodological and scientific novelty, and provide profound new biological insights. Generally, reliability and significance of biological discovery through computation should be validated and enriched by experimental studies. Inclusion of experimental validation is not required for publication, but should be referenced where possible. Inclusion of experimental validation of a modest biological discovery through computation does not render a manuscript suitable for PLOS Computational Biology. Research articles specifically designated as Methods papers should describe outstanding methods of exceptional importance that have been shown, or have the promise to provide new biological insights. The method must already be widely adopted, or have the promise of wide adoption by a broad community of users. Enhancements to existing published methods will only be considered if those enhancements bring exceptional new capabilities.