Knowledge-enhanced and structure-enhanced representation learning for protein–ligand binding affinity prediction

IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Mei Li , Ye Cao , Xiaoguang Liu , Hua Ji
{"title":"Knowledge-enhanced and structure-enhanced representation learning for protein–ligand binding affinity prediction","authors":"Mei Li ,&nbsp;Ye Cao ,&nbsp;Xiaoguang Liu ,&nbsp;Hua Ji","doi":"10.1016/j.patcog.2025.111701","DOIUrl":null,"url":null,"abstract":"<div><div>Protein–ligand binding affinity (PLA) prediction is a fundamental preliminary stage in drug discovery and development. Existing methods mainly focus on structure-free prediction of binding affinities and the investigation of structural PLA prediction is not fully explored yet. Spatial structures of protein–ligand complexes are critical in determining binding affinities. A few graph neural network (GNN) based methods model spatial structures of complexes with pairwise atomic distances within a cutoff, which provides insufficient spatial descriptions and limits their capabilities in distinguishing between certain molecules. In this paper, we propose a knowledge-enhanced and structure-enhanced representation learning method (KSM) for structural PLA prediction. The proposed KSM has a specially designed structure-based GNN (KSGNN) to learn complete representations for PLA prediction by combining sequence and structure information of complexes. Notably, KSGNN is capable of learning structure-aware representations via incorporating relative spatial information of distances and angles among atoms into the message passing. Additionally, we adopt an attentive pooling layer (APL) to further refine structural patterns in complexes. We compare KSM against 18 state-of-the-art baselines on two benchmarks. KSM outperforms its competitors with improvements of 0.0536 and 0.19 on the PDBbind core set and the CSAR-HiQ dataset, respectively, in terms of the metric of RMSE, demonstrating its superiority in binding affinity prediction.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"166 ","pages":"Article 111701"},"PeriodicalIF":7.5000,"publicationDate":"2025-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0031320325003619","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Protein–ligand binding affinity (PLA) prediction is a fundamental preliminary stage in drug discovery and development. Existing methods mainly focus on structure-free prediction of binding affinities and the investigation of structural PLA prediction is not fully explored yet. Spatial structures of protein–ligand complexes are critical in determining binding affinities. A few graph neural network (GNN) based methods model spatial structures of complexes with pairwise atomic distances within a cutoff, which provides insufficient spatial descriptions and limits their capabilities in distinguishing between certain molecules. In this paper, we propose a knowledge-enhanced and structure-enhanced representation learning method (KSM) for structural PLA prediction. The proposed KSM has a specially designed structure-based GNN (KSGNN) to learn complete representations for PLA prediction by combining sequence and structure information of complexes. Notably, KSGNN is capable of learning structure-aware representations via incorporating relative spatial information of distances and angles among atoms into the message passing. Additionally, we adopt an attentive pooling layer (APL) to further refine structural patterns in complexes. We compare KSM against 18 state-of-the-art baselines on two benchmarks. KSM outperforms its competitors with improvements of 0.0536 and 0.19 on the PDBbind core set and the CSAR-HiQ dataset, respectively, in terms of the metric of RMSE, demonstrating its superiority in binding affinity prediction.
蛋白质-配体结合亲和力预测的知识增强和结构增强表征学习
蛋白质-配体结合亲和力(PLA)预测是药物发现和开发的基础前期工作。现有的方法主要集中在无结构的结合亲和预测上,对PLA结构预测的研究还不够深入。蛋白质-配体复合物的空间结构是决定结合亲和力的关键。一些基于图神经网络(GNN)的方法在一个截止点内对具有原子距离的复合物的空间结构进行建模,这种方法提供的空间描述不足,限制了它们区分特定分子的能力。在本文中,我们提出了一种用于结构PLA预测的知识增强和结构增强表示学习方法(KSM)。所提出的KSM具有专门设计的基于结构的GNN (KSGNN),通过结合配合物的序列和结构信息来学习PLA预测的完整表示。值得注意的是,KSGNN能够通过将原子之间的距离和角度的相对空间信息整合到消息传递中来学习结构感知表示。此外,我们采用细心池化层(APL)来进一步细化复合物的结构模式。我们将KSM与18个最先进的基线在两个基准上进行比较。KSM在pdbinding核心集和CSAR-HiQ数据集上的RMSE指标分别提高了0.0536和0.19,优于竞争对手,显示了KSM在结合亲和度预测方面的优势。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Pattern Recognition
Pattern Recognition 工程技术-工程:电子与电气
CiteScore
14.40
自引率
16.20%
发文量
683
审稿时长
5.6 months
期刊介绍: The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信