ML-PLA: Enhancing Protein-Ligand Binding Affinity Prediction with Microenvironment and Long-Range Interaction-Aware Graph Neural Networks.

IF 5.3 2区化学 Q1 CHEMISTRY, MEDICINAL

Journal of Chemical Information and Modeling Pub Date : 2025-10-03 DOI:10.1021/acs.jcim.5c01974

Yajie Meng,Zhuang Zhang,Jincan Li,Xianfang Tang,Changcheng Lu,Zilong Zhang,Feifei Cui,Pan Zeng,Bo Li,Junlin Xu

{"title":"ML-PLA: Enhancing Protein-Ligand Binding Affinity Prediction with Microenvironment and Long-Range Interaction-Aware Graph Neural Networks.","authors":"Yajie Meng,Zhuang Zhang,Jincan Li,Xianfang Tang,Changcheng Lu,Zilong Zhang,Feifei Cui,Pan Zeng,Bo Li,Junlin Xu","doi":"10.1021/acs.jcim.5c01974","DOIUrl":null,"url":null,"abstract":"Accurately predicting protein-ligand binding affinity (PLA) is essential in drug discovery for identifying lead compounds. The sequence and structural contexts of an amino acid residue (i.e., microenvironment) describe the surrounding chemical properties and geometric features. While recent graph-based models have shown considerable promise, they often construct microenvironment representations using a shallow fusion of sequence and structural features, potentially failing to capture their full synergistic effects. Furthermore, the common reliance on a fixed distance threshold to define interaction space, while computationally efficient, inherently limits the ability to model key nonlocal biological phenomena. To address these issues, we propose a novel method named ML-PLA. Specifically, ML-PLA employs a heterogeneous graph neural network to model protein microenvironments by aggregating both sequence and structure information from neighboring nodes. Furthermore, we incorporate a vector quantized-variational autoencoder to capture the diversity and complexity of microenvironments, producing chemically meaningful, fine-grained representations. To effectively exploit long-range interaction information, ML-PLA projects protein-ligand complex atoms into multiple virtual atoms using a multihead attention mechanism, rather than simply increasing the number of graph neural network layers. This approach effectively embeds the interaction information into the complex atom features while simultaneously avoiding oversmoothing. Extensive experiments on the CASF-2016 and CASF-2013 benchmark data sets demonstrate the significant effectiveness and robust generalization capabilities of ML-PLA compared with state-of-the-art methods.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"98 1","pages":""},"PeriodicalIF":5.3000,"publicationDate":"2025-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Chemical Information and Modeling ","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1021/acs.jcim.5c01974","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MEDICINAL","Score":null,"Total":0}

引用次数: 0

Abstract

Accurately predicting protein-ligand binding affinity (PLA) is essential in drug discovery for identifying lead compounds. The sequence and structural contexts of an amino acid residue (i.e., microenvironment) describe the surrounding chemical properties and geometric features. While recent graph-based models have shown considerable promise, they often construct microenvironment representations using a shallow fusion of sequence and structural features, potentially failing to capture their full synergistic effects. Furthermore, the common reliance on a fixed distance threshold to define interaction space, while computationally efficient, inherently limits the ability to model key nonlocal biological phenomena. To address these issues, we propose a novel method named ML-PLA. Specifically, ML-PLA employs a heterogeneous graph neural network to model protein microenvironments by aggregating both sequence and structure information from neighboring nodes. Furthermore, we incorporate a vector quantized-variational autoencoder to capture the diversity and complexity of microenvironments, producing chemically meaningful, fine-grained representations. To effectively exploit long-range interaction information, ML-PLA projects protein-ligand complex atoms into multiple virtual atoms using a multihead attention mechanism, rather than simply increasing the number of graph neural network layers. This approach effectively embeds the interaction information into the complex atom features while simultaneously avoiding oversmoothing. Extensive experiments on the CASF-2016 and CASF-2013 benchmark data sets demonstrate the significant effectiveness and robust generalization capabilities of ML-PLA compared with state-of-the-art methods.

查看原文本刊更多论文

ML-PLA：利用微环境和远程相互作用感知图神经网络增强蛋白质配体结合亲和力预测。

准确预测蛋白质-配体结合亲和力（PLA）在药物发现中鉴定先导化合物至关重要。氨基酸残基的序列和结构背景（即微环境）描述了其周围的化学性质和几何特征。虽然最近基于图的模型显示出相当大的前景，但它们通常使用序列和结构特征的浅层融合来构建微环境表示，可能无法捕捉到它们的全部协同效应。此外，通常依赖于固定距离阈值来定义相互作用空间，虽然计算效率高，但本质上限制了对关键非局部生物现象的建模能力。为了解决这些问题，我们提出了一种新的ML-PLA方法。具体而言，ML-PLA采用异构图神经网络通过聚合相邻节点的序列和结构信息来建模蛋白质微环境。此外，我们结合了一个矢量量化变分自编码器来捕捉微环境的多样性和复杂性，产生化学上有意义的、细粒度的表示。为了有效地利用远程相互作用信息，ML-PLA使用多头注意机制将蛋白质-配体复合物原子投射到多个虚拟原子中，而不是简单地增加图神经网络层的数量。这种方法有效地将相互作用信息嵌入到复杂的原子特征中，同时避免了过度平滑。在CASF-2016和CASF-2013基准数据集上进行的大量实验表明，与最先进的方法相比，ML-PLA具有显著的有效性和强大的泛化能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Chemical Information and Modeling 化学-化学综合

CiteScore

9.80

自引率

10.70%

发文量

529

审稿时长

1.4 months

期刊介绍： The Journal of Chemical Information and Modeling publishes papers reporting new methodology and/or important applications in the fields of chemical informatics and molecular modeling. Specific topics include the representation and computer-based searching of chemical databases, molecular modeling, computer-aided molecular design of new materials, catalysts, or ligands, development of new computational methods or efficient algorithms for chemical software, and biopharmaceutical chemistry including analyses of biological activity and other issues related to drug discovery. Astute chemists, computer scientists, and information specialists look to this monthly’s insightful research studies, programming innovations, and software reviews to keep current with advances in this integral, multidisciplinary field. As a subscriber you’ll stay abreast of database search systems, use of graph theory in chemical problems, substructure search systems, pattern recognition and clustering, analysis of chemical and physical data, molecular modeling, graphics and natural language interfaces, bibliometric and citation analysis, and synthesis design and reactions databases.