{"title":"ML-PLA: Enhancing Protein-Ligand Binding Affinity Prediction with Microenvironment and Long-Range Interaction-Aware Graph Neural Networks.","authors":"Yajie Meng,Zhuang Zhang,Jincan Li,Xianfang Tang,Changcheng Lu,Zilong Zhang,Feifei Cui,Pan Zeng,Bo Li,Junlin Xu","doi":"10.1021/acs.jcim.5c01974","DOIUrl":null,"url":null,"abstract":"Accurately predicting protein-ligand binding affinity (PLA) is essential in drug discovery for identifying lead compounds. The sequence and structural contexts of an amino acid residue (i.e., microenvironment) describe the surrounding chemical properties and geometric features. While recent graph-based models have shown considerable promise, they often construct microenvironment representations using a shallow fusion of sequence and structural features, potentially failing to capture their full synergistic effects. Furthermore, the common reliance on a fixed distance threshold to define interaction space, while computationally efficient, inherently limits the ability to model key nonlocal biological phenomena. To address these issues, we propose a novel method named ML-PLA. Specifically, ML-PLA employs a heterogeneous graph neural network to model protein microenvironments by aggregating both sequence and structure information from neighboring nodes. Furthermore, we incorporate a vector quantized-variational autoencoder to capture the diversity and complexity of microenvironments, producing chemically meaningful, fine-grained representations. To effectively exploit long-range interaction information, ML-PLA projects protein-ligand complex atoms into multiple virtual atoms using a multihead attention mechanism, rather than simply increasing the number of graph neural network layers. This approach effectively embeds the interaction information into the complex atom features while simultaneously avoiding oversmoothing. Extensive experiments on the CASF-2016 and CASF-2013 benchmark data sets demonstrate the significant effectiveness and robust generalization capabilities of ML-PLA compared with state-of-the-art methods.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"98 1","pages":""},"PeriodicalIF":5.3000,"publicationDate":"2025-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Chemical Information and Modeling ","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1021/acs.jcim.5c01974","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MEDICINAL","Score":null,"Total":0}
引用次数: 0
Abstract
Accurately predicting protein-ligand binding affinity (PLA) is essential in drug discovery for identifying lead compounds. The sequence and structural contexts of an amino acid residue (i.e., microenvironment) describe the surrounding chemical properties and geometric features. While recent graph-based models have shown considerable promise, they often construct microenvironment representations using a shallow fusion of sequence and structural features, potentially failing to capture their full synergistic effects. Furthermore, the common reliance on a fixed distance threshold to define interaction space, while computationally efficient, inherently limits the ability to model key nonlocal biological phenomena. To address these issues, we propose a novel method named ML-PLA. Specifically, ML-PLA employs a heterogeneous graph neural network to model protein microenvironments by aggregating both sequence and structure information from neighboring nodes. Furthermore, we incorporate a vector quantized-variational autoencoder to capture the diversity and complexity of microenvironments, producing chemically meaningful, fine-grained representations. To effectively exploit long-range interaction information, ML-PLA projects protein-ligand complex atoms into multiple virtual atoms using a multihead attention mechanism, rather than simply increasing the number of graph neural network layers. This approach effectively embeds the interaction information into the complex atom features while simultaneously avoiding oversmoothing. Extensive experiments on the CASF-2016 and CASF-2013 benchmark data sets demonstrate the significant effectiveness and robust generalization capabilities of ML-PLA compared with state-of-the-art methods.
期刊介绍:
The Journal of Chemical Information and Modeling publishes papers reporting new methodology and/or important applications in the fields of chemical informatics and molecular modeling. Specific topics include the representation and computer-based searching of chemical databases, molecular modeling, computer-aided molecular design of new materials, catalysts, or ligands, development of new computational methods or efficient algorithms for chemical software, and biopharmaceutical chemistry including analyses of biological activity and other issues related to drug discovery.
Astute chemists, computer scientists, and information specialists look to this monthly’s insightful research studies, programming innovations, and software reviews to keep current with advances in this integral, multidisciplinary field.
As a subscriber you’ll stay abreast of database search systems, use of graph theory in chemical problems, substructure search systems, pattern recognition and clustering, analysis of chemical and physical data, molecular modeling, graphics and natural language interfaces, bibliometric and citation analysis, and synthesis design and reactions databases.