Donya Ohadi, Kiran Kumar, Suchitra Ravula, Renee L DesJarlais, Mark J Seierstad, Amy Y Shih, Michael D Hack, Jamie M Schiffer
{"title":"Input Pose is Key to Performance of Free Energy Perturbation: Benchmarking with Monoacylglycerol Lipase.","authors":"Donya Ohadi, Kiran Kumar, Suchitra Ravula, Renee L DesJarlais, Mark J Seierstad, Amy Y Shih, Michael D Hack, Jamie M Schiffer","doi":"10.1021/acs.jcim.4c01223","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c01223","url":null,"abstract":"<p><p>Free energy perturbation (FEP) methodologies have become commonplace methods for modeling potency in hit-to-lead and lead optimization stages of drug discovery. The conformational states of the initial poses of compounds for FEP+ calculations are often set up by alignment to a cocrystal structure ligand, but it is not clear if this method provides the best result for all proteins or all ligands. Not only are ligand conformational states potential variables in modeling compound potency in FEP but also the selection of crystallographic water molecules for inclusion in the FEP input structures can impact FEP models. Here, we report the results of FEP calculations using FEP+ from Schrödinger and starting from maximum common substructure alignment and docked poses generated with an array of docking methodologies. As a benchmark data set, we use monoacylglycerol lipase (MAGL), an important clinical drug target in cancer malignancy, neurological diseases, and metabolic disorders, and a set of 17 MAGL inhibitors. We found a large variation among FEP+ correlations to experimental IC<sub>50</sub> values depending on the method used to generate the input pose and that the inclusion of ligand-based information in the docking process, with some methods, increases the correlation between FEP+ free energies and IC<sub>50</sub> values. Upon analysis of the initial poses, we found that the differences in FEP+ correlations stemmed from rotation around a tertiary amide bond as well as translation of the compound toward the more hydrophobic side of the MAGL pocket. FEP+ estimation improved across all pose modeling methods when hydrogen bond constraint information was added. However, simple maximum common substructure alignment in the presence of all crystallographic water molecules outperformed all other methods in correlation between estimated and experimental IC<sub>50</sub> values. Taken together, these findings suggest that pose selection and crystallographic water inclusion greatly impact how well FEP+ estimated IC<sub>50</sub> values align with experimental IC<sub>50</sub> values and that modelers should benchmark a few different pose generation methodologies and different water inclusion strategies for their hit-to-lead and lead optimization drug discovery projects.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142666471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Widespread Misinterpretation of p<i>K</i><sub>a</sub> Terminology for Zwitterionic Compounds and Its Consequences.","authors":"Jonathan W Zheng, Ivo Leito, William H Green","doi":"10.1021/acs.jcim.4c01420","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c01420","url":null,"abstract":"<p><p>The acid dissociation constant (p<i>K</i><sub>a</sub>), which quantifies the propensity for a solute to donate a proton to its solvent, is crucial for drug design and synthesis, environmental fate studies, chemical manufacturing, and many other fields. Unfortunately, the terminology used for describing acid-base phenomena is sometimes inconsistent, causing large potential for misinterpretation. In this work, we examine a systematic confusion underlying the definition of \"acidic\" and \"basic\" p<i>K</i><sub>a</sub> values for zwitterionic compounds. Due to this confusion, some p<i>K</i><sub>a</sub> data are misrepresented in data repositories, including the widely used and highly trusted ChEMBL database. Such datasets are frequently used to supply training data for p<i>K</i><sub>a</sub> prediction models, and hence, confusion and errors in the data make the model performance worse. Herein, we discuss the intricacies of this issue. We make suggestions for describing acid-base phenomena, training p<i>K</i><sub>a</sub> prediction models, and stewarding p<i>K</i><sub>a</sub> datasets, given the high potential for confusion and potentially high impact in downstream applications.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142666199","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wagner da Rocha, Leo Liberti, Antonio Mucherino, Thérèse E Malliavin
{"title":"Influence of Stereochemistry in a Local Approach for Calculating Protein Conformations.","authors":"Wagner da Rocha, Leo Liberti, Antonio Mucherino, Thérèse E Malliavin","doi":"10.1021/acs.jcim.4c01232","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c01232","url":null,"abstract":"<p><p>Protein structure prediction is generally based on the use of local conformational information coupled with long-range distance restraints. Such restraints can be derived from the knowledge of a template structure or the analysis of protein sequence alignment in the framework of models arising from the physics of disordered systems. The accuracy of approaches based on sequence alignment, however, is limited in the case where the number of aligned sequences is small. Here, we derive protein conformations using only local conformations knowledge by means of the interval Branch-and-Prune algorithm. The computation efficiency is directly related to the knowledge of stereochemistry (bond angle and ω values) along the protein sequence and, in particular, to the variations of the torsion angle ω. The impact of stereochemistry variations is particularly strong in the case of protein topologies defined from numerous long-range restraints, as in the case of protein of β secondary structures. The systematic enumeration of the conformations improves the efficiency of the calculations. The analysis of DNA codons permits to connect the variations of torsion angle ω to the positions of rare DNA codons.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142666470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Romanos Fasoulis, Georgios Paliouras, Lydia E Kavraki
{"title":"RankMHC: Learning to Rank Class-I Peptide-MHC Structural Models.","authors":"Romanos Fasoulis, Georgios Paliouras, Lydia E Kavraki","doi":"10.1021/acs.jcim.4c01278","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c01278","url":null,"abstract":"<p><p>The binding of peptides to class-I Major Histocompability Complex (MHC) receptors and their subsequent recognition downstream by T-cell receptors are crucial processes for most multicellular organisms to be able to fight various diseases. Thus, the identification of peptide antigens that can elicit an immune response is of immense importance for developing successful therapies for bacterial and viral infections, even cancer. Recently, studies have demonstrated the importance of peptide-MHC (pMHC) structural analysis, with pMHC structural modeling methods gradually becoming more popular in peptide antigen identification workflows. Most of the pMHC structural modeling tools provide an ensemble of candidate peptide poses in the MHC-I cleft, each associated with a score stemming from a scoring function, with the top scoring pose assumed to be the most representative of the ensemble. However, identifying the binding mode, that is, the peptide pose from the ensemble that is closer to an unavailable native structure, is not trivial. Oftentimes, the peptide poses characterized as best by a protein-ligand scoring function are not the ones that are the most representative of the actual structure. In this work, we frame the peptide binding pose identification problem as a Learning-to-Rank (LTR) problem. We present RankMHC, an LTR-based pMHC binding mode identification predictor, which is specifically trained to predict the most accurate ranking of an ensemble of pMHC conformations. RankMHC outperforms classical peptide-ligand scoring functions, as well as previous Machine Learning (ML)-based binding pose predictors. We further demonstrate that RankMHC can be used with many pMHC structural modeling tools that use different structural modeling protocols.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142646381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MD-LAIs Software: Computing Whole-Sequence and Amino Acid-Level \"Embeddings\" for Peptides and Proteins.","authors":"Ernesto Contreras-Torres, Yovani Marrero-Ponce","doi":"10.1021/acs.jcim.3c01189","DOIUrl":"https://doi.org/10.1021/acs.jcim.3c01189","url":null,"abstract":"<p><p>Several computational tools have been developed to calculate sequence-based molecular descriptors (MDs) for peptides and proteins. However, these tools have certain limitations: 1) They generally lack capabilities for curating input data. 2) Their outputs often exhibit significant overlap. 3) There is limited availability of MDs at the amino acid (<i>aa</i>) level. 4) They lack flexibility in computing specific MDs. To address these issues, we developed <b>MD-LAIs</b> (<b>M</b>olecular <b>D</b>escriptors from <b>L</b>ocal <b>A</b>mino acid <b>I</b>nvariant<b>s</b>), Java-based software designed to compute both whole-sequence and <i>aa</i>-level MDs for peptides and proteins. These MDs are generated by applying aggregation operators (<b>AOs</b>) to macromolecular vectors containing the chemical-physical and structural properties of <i>aas</i>. The set of <b>AOs</b> includes both nonclassical (e.g., Minkowski norms) and classical <b>AOs</b> (e.g., Radial Distribution Function). Classical <b>AOs</b> capture neighborhood structural information at different <i>k</i> levels, while nonclassical <b>AOs</b> are applied using a sliding window to generalize the <i>aa</i>-level output. A weighting system based on fuzzy membership functions is also included to account for the contributions of individual <i>aas</i>. <b>MD-LAIs</b> features: 1) a module for data curation tasks, 2) a feature selection module, 3) projects of highly relevant MDs, and 4) low-dimensional lists of informative global and <i>aa</i>-level MDs. Overall, we expect that <b>MD-LAIs</b> will be a valuable tool for encoding protein or peptide sequences. The software is freely available as a stand-alone system on GitHub (https://github.com/Grupo-Medicina-Molecular-y-Traslacional/MD_LAIS).</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142646378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhiyuan Zhou, Yueming Yin, Hao Han, Yiping Jia, Jun Hong Koh, Adams Wai-Kin Kong, Yuguang Mu
{"title":"ProAffinity-GNN: A Novel Approach to Structure-Based Protein-Protein Binding Affinity Prediction via a Curated Data Set and Graph Neural Networks.","authors":"Zhiyuan Zhou, Yueming Yin, Hao Han, Yiping Jia, Jun Hong Koh, Adams Wai-Kin Kong, Yuguang Mu","doi":"10.1021/acs.jcim.4c01850","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c01850","url":null,"abstract":"<p><p>Protein-protein interactions (PPIs) are crucial for understanding biological processes and disease mechanisms, contributing significantly to advances in protein engineering and drug discovery. The accurate determination of binding affinities, essential for decoding PPIs, faces challenges due to the substantial time and financial costs involved in experimental and theoretical methods. This situation underscores the urgent need for more effective and precise methodologies for predicting binding affinity. Despite the abundance of research on PPI modeling, the field of quantitative binding affinity prediction remains underexplored, mainly due to a lack of comprehensive data. This study seeks to address these needs by manually curating pairwise interaction labels on available 3D structures of protein complexes, with experimentally determined binding affinities, creating the largest data set for structure-based pairwise protein interaction with binding affinity to date. Subsequently, we introduce ProAffinity-GNN, a novel deep learning framework using protein language model and graph neural network (GNN) to improve the accuracy of prediction of structure-based protein-protein binding affinities. The evaluation results across several benchmark test sets and an additional case study demonstrate that ProAffinity-GNN not only outperforms existing models in terms of accuracy but also shows strong generalization capabilities.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142666472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Transparent Machine Learning Model to Understand Drug Permeability through the Blood-Brain Barrier.","authors":"Hengjian Jia, Gabriele C Sosso","doi":"10.1021/acs.jcim.4c01217","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c01217","url":null,"abstract":"<p><p>The blood-brain barrier (BBB) selectively regulates the passage of chemical compounds into and out of the central nervous system (CNS). As such, understanding the permeability of drug molecules through the BBB is key to treating neurological diseases and evaluating the response of the CNS to medical treatments. Within the last two decades, a diverse portfolio of machine learning (ML) models have been regularly utilized as a tool to predict, and, to a much lesser extent, understand, several functional properties of medicinal drugs, including their propensity to pass through the BBB. However, the most numerically accurate models to date lack in transparency, as they typically rely on complex blends of different descriptors (or features or fingerprints), many of which are not necessarily interpretable in a straightforward fashion. In fact, the \"black-box\" nature of these models has prevented us from pinpointing any specific design rule to craft the next generation of pharmaceuticals that need to pass (or not) through the BBB. In this work, we have developed a ML model that leverages an uncomplicated, transparent set of descriptors to predict the permeability of drug molecules through the BBB. In addition to its simplicity, our model achieves comparable results in terms of accuracy compared to state-of-the-art models. Moreover, we use a naive Bayes model as an analytical tool to provide further insights into the structure-function relation that underpins the capacity of a given drug molecule to pass through the BBB. Although our results are computational rather than experimental, we have identified several molecular fragments and functional groups that may significantly impact a drug's likelihood of permeating the BBB. This work provides a unique angle to the BBB problem and lays the foundations for future work aimed at leveraging additional transparent descriptors, potentially obtained via bespoke molecular dynamics simulations.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142666184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
David A Schaller, Clara D Christ, John D Chodera, Andrea Volkamer
{"title":"Benchmarking Cross-Docking Strategies in Kinase Drug Discovery.","authors":"David A Schaller, Clara D Christ, John D Chodera, Andrea Volkamer","doi":"10.1021/acs.jcim.4c00905","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c00905","url":null,"abstract":"<p><p>In recent years, machine learning has transformed many aspects of the drug discovery process, including small molecule design, for which the prediction of bioactivity is an integral part. Leveraging structural information about the interactions between a small molecule and its protein target has great potential for downstream machine learning scoring approaches but is fundamentally limited by the accuracy with which protein-ligand complex structures can be predicted in a reliable and automated fashion. With the goal of finding practical approaches to generating useful kinase-inhibitor complex geometries for downstream machine learning scoring approaches, we present a kinase-centric docking benchmark assessing the performance of different classes of docking and pose selection strategies to assess how well experimentally observed binding modes are recapitulated in a realistic cross-docking scenario. The assembled benchmark data set focuses on the well-studied protein kinase family and comprises a subset of 589 protein structures cocrystallized with 423 ATP-competitive ligands. We find that the docking methods biased by the cocrystallized ligand, utilizing shape overlap with or without maximum common substructure matching, are more successful in recovering binding poses than standard physics-based docking alone. Also, docking into multiple structures significantly increases the chance of generating a low root-mean-square deviation (RMSD) docking pose. Docking utilizing an approach that combines all three methods (Posit) into structures with the most similar cocrystallized ligands according to the maximum common substructure (MCS) proved to be the most efficient way to reproduce binding poses, achieving a success rate of 70.4% across all included systems. The studied docking and pose selection strategies, which utilize the OpenEye Toolkits, were implemented into pipelines of the KinoML framework, allowing automated and reliable protein-ligand complex generation for future downstream machine learning tasks. Although focused on protein kinases, we believe that the general findings can also be transferred to other protein families.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142666469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Combining a Chemical Language Model and the Structure-Activity Relationship Matrix Formalism for Generative Design of Potent Compounds with Core Structure and Substituent Modifications.","authors":"Hengwei Chen, Jürgen Bajorath","doi":"10.1021/acs.jcim.4c01781","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c01781","url":null,"abstract":"<p><p>In medicinal chemistry, compound optimization relies on the generation of analogue series (AS) for exploring structure-activity relationships (SARs). Potency progression is a critical criterion for advancing AS. During optimization, a key question is which analogues to synthesize next. We introduce a new computational methodology for the extension of AS with potent compounds containing both core structure and substituent modifications at multiple sites, which has been reported for the first time. The approach combines a transformer chemical language model (CLM) with a SAR matrix (SARM) methodology that identifies and organizes structurally related AS. Therefore, the SARM approach was expanded to cover multisite AS. Consensus series extracted from SARMs representing a potency gradient served as input for CLM training to extend test AS with potent analogues. Different model variants were derived and investigated. Both general and fine-tuned models correctly predicted known potent analogues at high positions in probability-based compound rankings and chemically diversified AS through core structure modifications of the generated candidate compounds and substituent replacements at multiple sites.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142638031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Divide-and-Conquer Approach to Nanoparticle Global Optimisation Using Machine Learning.","authors":"Nicholas B Smith, Anna L Garden","doi":"10.1021/acs.jcim.4c01516","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c01516","url":null,"abstract":"<p><p>Global optimization of the structure of atomic nanoparticles is often hampered by the presence of many funnels on the potential energy surface. While broad funnels are readily encountered and easily exploited by the search, narrow funnels are more difficult to locate and explore, presenting a problem if the global minimum is situated in such a funnel. Here, a divide-and-conquer approach is applied to overcome the issue posed by the multifunnel effect using a machine learning approach, without using <i>a priori</i> knowledge of the potential energy surface. This approach begins with a truncated exploration to gather coarse-grained knowledge of the potential energy surface. This is then used to train a machine learning Gaussian mixture model to divide up the potential energy surface into separate regions, with each region then being explored in more detail (or conquered) separately. This scheme was tested on a variety of multifunnel systems and yielded significant improvements to the times taken to locate the global minima of Lennard-Jones (LJ) nanoparticles, LJ<sub>75</sub> and LJ<sub>104</sub>, as well as two metallic systems, Au<sub>55</sub> and Pd<sub>88</sub>. However, difficulties were encountered for LJ<sub>98</sub>, providing insight into how the scheme could be further improved.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142638030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}