Donya Ohadi, Kiran Kumar, Suchitra Ravula, Renee L DesJarlais, Mark J Seierstad, Amy Y Shih, Michael D Hack, Jamie M Schiffer
{"title":"Input Pose is Key to Performance of Free Energy Perturbation: Benchmarking with Monoacylglycerol Lipase.","authors":"Donya Ohadi, Kiran Kumar, Suchitra Ravula, Renee L DesJarlais, Mark J Seierstad, Amy Y Shih, Michael D Hack, Jamie M Schiffer","doi":"10.1021/acs.jcim.4c01223","DOIUrl":"10.1021/acs.jcim.4c01223","url":null,"abstract":"<p><p>Free energy perturbation (FEP) methodologies have become commonplace methods for modeling potency in hit-to-lead and lead optimization stages of drug discovery. The conformational states of the initial poses of compounds for FEP+ calculations are often set up by alignment to a cocrystal structure ligand, but it is not clear if this method provides the best result for all proteins or all ligands. Not only are ligand conformational states potential variables in modeling compound potency in FEP but also the selection of crystallographic water molecules for inclusion in the FEP input structures can impact FEP models. Here, we report the results of FEP calculations using FEP+ from Schrödinger and starting from maximum common substructure alignment and docked poses generated with an array of docking methodologies. As a benchmark data set, we use monoacylglycerol lipase (MAGL), an important clinical drug target in cancer malignancy, neurological diseases, and metabolic disorders, and a set of 17 MAGL inhibitors. We found a large variation among FEP+ correlations to experimental IC<sub>50</sub> values depending on the method used to generate the input pose and that the inclusion of ligand-based information in the docking process, with some methods, increases the correlation between FEP+ free energies and IC<sub>50</sub> values. Upon analysis of the initial poses, we found that the differences in FEP+ correlations stemmed from rotation around a tertiary amide bond as well as translation of the compound toward the more hydrophobic side of the MAGL pocket. FEP+ estimation improved across all pose modeling methods when hydrogen bond constraint information was added. However, simple maximum common substructure alignment in the presence of all crystallographic water molecules outperformed all other methods in correlation between estimated and experimental IC<sub>50</sub> values. Taken together, these findings suggest that pose selection and crystallographic water inclusion greatly impact how well FEP+ estimated IC<sub>50</sub> values align with experimental IC<sub>50</sub> values and that modelers should benchmark a few different pose generation methodologies and different water inclusion strategies for their hit-to-lead and lead optimization drug discovery projects.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":"8859-8869"},"PeriodicalIF":5.6,"publicationDate":"2024-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142666471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mingqing Liu, Xuechun Meng, Yiyang Mao, Hongqi Li, Ji Liu
{"title":"ReduMixDTI: Prediction of Drug-Target Interaction with Feature Redundancy Reduction and Interpretable Attention Mechanism.","authors":"Mingqing Liu, Xuechun Meng, Yiyang Mao, Hongqi Li, Ji Liu","doi":"10.1021/acs.jcim.4c01554","DOIUrl":"10.1021/acs.jcim.4c01554","url":null,"abstract":"<p><p>Identifying drug-target interactions (DTIs) is essential for drug discovery and development. Existing deep learning approaches to DTI prediction often employ powerful feature encoders to represent drugs and targets holistically, which usually cause significant redundancy and noise by neglecting the restricted binding regions. Furthermore, many previous DTI networks ignore or simplify the complex intermolecular interaction process involving diverse binding types, which significantly limits both predictive ability and interpretability. We propose ReduMixDTI, an end-to-end model that addresses feature redundancy and explicitly captures complex local interactions for DTI prediction. In this study, drug and target features are encoded by using graph neural networks and convolutional neural networks, respectively. These features are refined from channel and spatial perspectives to enhance the representations. The proposed attention mechanism explicitly models pairwise interactions between drug and target substructures, improving the model's understanding of binding processes. In extensive comparisons with seven state-of-the-art methods, ReduMixDTI demonstrates superior performance across three benchmark data sets and external test sets reflecting real-world scenarios. Additionally, we perform comprehensive ablation studies and visualize protein attention weights to enhance the interpretability. The results confirm that ReduMixDTI serves as a robust and interpretable model for reducing feature redundancy, contributing to advances in DTI prediction.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":"8952-8962"},"PeriodicalIF":5.6,"publicationDate":"2024-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142685415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Combining a Chemical Language Model and the Structure-Activity Relationship Matrix Formalism for Generative Design of Potent Compounds with Core Structure and Substituent Modifications.","authors":"Hengwei Chen, Jürgen Bajorath","doi":"10.1021/acs.jcim.4c01781","DOIUrl":"10.1021/acs.jcim.4c01781","url":null,"abstract":"<p><p>In medicinal chemistry, compound optimization relies on the generation of analogue series (AS) for exploring structure-activity relationships (SARs). Potency progression is a critical criterion for advancing AS. During optimization, a key question is which analogues to synthesize next. We introduce a new computational methodology for the extension of AS with potent compounds containing both core structure and substituent modifications at multiple sites, which has been reported for the first time. The approach combines a transformer chemical language model (CLM) with a SAR matrix (SARM) methodology that identifies and organizes structurally related AS. Therefore, the SARM approach was expanded to cover multisite AS. Consensus series extracted from SARMs representing a potency gradient served as input for CLM training to extend test AS with potent analogues. Different model variants were derived and investigated. Both general and fine-tuned models correctly predicted known potent analogues at high positions in probability-based compound rankings and chemically diversified AS through core structure modifications of the generated candidate compounds and substituent replacements at multiple sites.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":"8784-8795"},"PeriodicalIF":5.6,"publicationDate":"2024-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142638031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Widespread Misinterpretation of p<i>K</i><sub>a</sub> Terminology for Zwitterionic Compounds and Its Consequences.","authors":"Jonathan W Zheng, Ivo Leito, William H Green","doi":"10.1021/acs.jcim.4c01420","DOIUrl":"10.1021/acs.jcim.4c01420","url":null,"abstract":"<p><p>The acid dissociation constant (p<i>K</i><sub>a</sub>), which quantifies the propensity for a solute to donate a proton to its solvent, is crucial for drug design and synthesis, environmental fate studies, chemical manufacturing, and many other fields. Unfortunately, the terminology used for describing acid-base phenomena is sometimes inconsistent, causing large potential for misinterpretation. In this work, we examine a systematic confusion underlying the definition of \"acidic\" and \"basic\" p<i>K</i><sub>a</sub> values for zwitterionic compounds. Due to this confusion, some p<i>K</i><sub>a</sub> data are misrepresented in data repositories, including the widely used and highly trusted ChEMBL database. Such datasets are frequently used to supply training data for p<i>K</i><sub>a</sub> prediction models, and hence, confusion and errors in the data make the model performance worse. Herein, we discuss the intricacies of this issue. We make suggestions for describing acid-base phenomena, training p<i>K</i><sub>a</sub> prediction models, and stewarding p<i>K</i><sub>a</sub> datasets, given the high potential for confusion and potentially high impact in downstream applications.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":"8838-8847"},"PeriodicalIF":5.6,"publicationDate":"2024-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142666199","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Identification of Macrophage-Associated Novel Drug Targets in Atherosclerosis Based on Integrated Transcriptome Features.","authors":"Jingzhi Wang, Sida Qin, Xiaohui Zhang, Jixin Zhi","doi":"10.1021/acs.jcim.4c01558","DOIUrl":"10.1021/acs.jcim.4c01558","url":null,"abstract":"<p><strong>Background: </strong>This study explores the pathological mechanisms of atherosclerosis (AS), focusing on the role of macrophages in its formation and development, and potential therapeutic targets.</p><p><strong>Methods: </strong>The heterogeneity of the AS single-cell data set GSE131778 was analyzed using Seurat. Tissue sequencing data GSE28829 and GSE43292 were analyzed for immune cell abundance using CIBERSORT. Differential genes were identified, and WGCNA was used to create a coexpression network. Hub genes were identified using MCODE and CytoHubba and analyzed with GO and KEGG enrichment analysis, GSVA, and immune infiltration analysis. DrugBank identified potential drugs, and molecular docking verified drug binding to key targets. Key targets were experimentally validated.</p><p><strong>Results: </strong>Nineteen cell clusters were identified in the GSE131778 data set, classified into ten cell types. Macrophages in AS and normal tissues were identified based on cell abundance. CIBERSORT showed a significant increase in cell cluster 9 in AS samples. Thirty-two hub genes, including CD86, LILRB2, and IRF8, were validated. GO and KEGG analyses indicated Hub genes primarily affect immune functions. GSVA identified 29 significantly increased pathways in AS samples. Immune infiltration analysis revealed a positive correlation between IRF8, CD86, and LILRB2 expression and macrophage content. Molecular docking suggested CD86 as a potential drug target for AS. qRT-PCR confirmed increased IRF8 and CD86 expression.</p><p><strong>Conclusions: </strong>CD86, LILRB2, and IRF8 are highly expressed in foam cell samples, with CD86 forming hydrogen bonds with several AS drugs, indicating CD86 as a promising target for AS treatment.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":"9009-9020"},"PeriodicalIF":5.6,"publicationDate":"2024-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142680023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improved Prediction of Ligand-Protein Binding Affinities by Meta-modeling.","authors":"Ho-Joon Lee, Prashant S Emani, Mark B Gerstein","doi":"10.1021/acs.jcim.4c01116","DOIUrl":"10.1021/acs.jcim.4c01116","url":null,"abstract":"<p><p>The accurate screening of candidate drug ligands against target proteins through computational approaches is of prime interest to drug development efforts. Such virtual screening depends in part on methods to predict the binding affinity between ligands and proteins. Many computational models for binding affinity prediction have been developed, but with varying results across targets. Given that ensembling or meta-modeling approaches have shown great promise in reducing model-specific biases, we develop a framework to integrate published force-field-based empirical docking and sequence-based deep learning models. In building this framework, we evaluate many combinations of individual base models, training databases, and several meta-modeling approaches. We show that many of our meta-models significantly improve affinity predictions over base models. Our best meta-models achieve comparable performance to state-of-the-art deep learning tools exclusively based on 3D structures while allowing for improved database scalability and flexibility through the explicit inclusion of features such as physicochemical properties or molecular descriptors. We further demonstrate improved generalization capability by our models using a large-scale benchmark of affinity prediction as well as a virtual screening application benchmark. Overall, we demonstrate that diverse modeling approaches can be ensembled together to gain meaningful improvement in binding affinity prediction.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":"8684-8704"},"PeriodicalIF":5.6,"publicationDate":"2024-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11632770/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142692246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Romanos Fasoulis, Georgios Paliouras, Lydia E Kavraki
{"title":"RankMHC: Learning to Rank Class-I Peptide-MHC Structural Models.","authors":"Romanos Fasoulis, Georgios Paliouras, Lydia E Kavraki","doi":"10.1021/acs.jcim.4c01278","DOIUrl":"10.1021/acs.jcim.4c01278","url":null,"abstract":"<p><p>The binding of peptides to class-I Major Histocompability Complex (MHC) receptors and their subsequent recognition downstream by T-cell receptors are crucial processes for most multicellular organisms to be able to fight various diseases. Thus, the identification of peptide antigens that can elicit an immune response is of immense importance for developing successful therapies for bacterial and viral infections, even cancer. Recently, studies have demonstrated the importance of peptide-MHC (pMHC) structural analysis, with pMHC structural modeling methods gradually becoming more popular in peptide antigen identification workflows. Most of the pMHC structural modeling tools provide an ensemble of candidate peptide poses in the MHC-I cleft, each associated with a score stemming from a scoring function, with the top scoring pose assumed to be the most representative of the ensemble. However, identifying the binding mode, that is, the peptide pose from the ensemble that is closer to an unavailable native structure, is not trivial. Oftentimes, the peptide poses characterized as best by a protein-ligand scoring function are not the ones that are the most representative of the actual structure. In this work, we frame the peptide binding pose identification problem as a Learning-to-Rank (LTR) problem. We present RankMHC, an LTR-based pMHC binding mode identification predictor, which is specifically trained to predict the most accurate ranking of an ensemble of pMHC conformations. RankMHC outperforms classical peptide-ligand scoring functions, as well as previous Machine Learning (ML)-based binding pose predictors. We further demonstrate that RankMHC can be used with many pMHC structural modeling tools that use different structural modeling protocols.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":"8729-8742"},"PeriodicalIF":5.6,"publicationDate":"2024-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11633655/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142646381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Ordinal Confidence Level Assignments for Regression Model Predictions","authors":"Steven Kearnes*, and , Patrick Riley*, ","doi":"10.1021/acs.jcim.4c0175510.1021/acs.jcim.4c01755","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c01755https://doi.org/10.1021/acs.jcim.4c01755","url":null,"abstract":"<p >We present a simple method for assigning accurate confidence levels to molecular property predictions from regression models. These confidence levels are easy to interpret and useful for making decisions in drug discovery programs. We demonstrate their performance using time-split validation with assay data from the Relay Therapeutics internal database.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"64 24","pages":"9299–9305 9299–9305"},"PeriodicalIF":5.6,"publicationDate":"2024-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142874909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wagner da Rocha, Leo Liberti, Antonio Mucherino, Thérèse E Malliavin
{"title":"Influence of Stereochemistry in a Local Approach for Calculating Protein Conformations.","authors":"Wagner da Rocha, Leo Liberti, Antonio Mucherino, Thérèse E Malliavin","doi":"10.1021/acs.jcim.4c01232","DOIUrl":"10.1021/acs.jcim.4c01232","url":null,"abstract":"<p><p>Protein structure prediction is generally based on the use of local conformational information coupled with long-range distance restraints. Such restraints can be derived from the knowledge of a template structure or the analysis of protein sequence alignment in the framework of models arising from the physics of disordered systems. The accuracy of approaches based on sequence alignment, however, is limited in the case where the number of aligned sequences is small. Here, we derive protein conformations using only local conformations knowledge by means of the interval Branch-and-Prune algorithm. The computation efficiency is directly related to the knowledge of stereochemistry (bond angle and ω values) along the protein sequence and, in particular, to the variations of the torsion angle ω. The impact of stereochemistry variations is particularly strong in the case of protein topologies defined from numerous long-range restraints, as in the case of protein of β secondary structures. The systematic enumeration of the conformations improves the efficiency of the calculations. The analysis of DNA codons permits to connect the variations of torsion angle ω to the positions of rare DNA codons.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":"8999-9008"},"PeriodicalIF":5.6,"publicationDate":"2024-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142666470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Christian Hölzer, Rick Oerder, Stefan Grimme, Jan Hamaekers
{"title":"ConfRank: Improving GFN-FF Conformer Ranking with Pairwise Training.","authors":"Christian Hölzer, Rick Oerder, Stefan Grimme, Jan Hamaekers","doi":"10.1021/acs.jcim.4c01524","DOIUrl":"10.1021/acs.jcim.4c01524","url":null,"abstract":"<p><p>Conformer ranking is a crucial task for drug discovery, with methods for generating conformers often based on molecular (meta)dynamics or sophisticated sampling techniques. These methods are constrained by the underlying force computation regarding runtime and energy ranking accuracy, limiting their effectiveness for large-scale screening applications. To address these ranking limitations, we introduce ConfRank, a machine learning-based approach that enhances conformer ranking using pairwise training. We demonstrate its performance using GFN-FF-generated conformer ensembles, leveraging the DimeNet++ architecture trained on pairs of 159 760 uncharged organic compounds from the GEOM data set with r<sup>2</sup>SCAN-3c reference level. Instead of predicting only on single molecules, this approach captures relative energy differences between conformers, leading to a significant improvement of the overall conformational ranking, outperforming GFN-FF and GFN2-xTB. Thereby, the pairwise RMSD of the relative energy difference of two conformers can be reduced from 5.65 to 0.71 kcal mol<sup>-1</sup> on the test data set, allowing to correctly identify up to 81% of all lowest lying conformers correctly (GFN-FF: 10%, GFN2-xTB: 47%). The ConfRank approach is cost-effective, allowing for scalable deployment on both CPU and GPU, achieving runtime accelerations by up to 2 orders of magnitude compared to GFN2-xTB. Out-of-sample investigations on CREST-generated conformer ensembles from the QM9 data set and conformers taken from an extended GMTKN55 data set show promising results for the robustness of this approach. Thereby, ranking correlation coefficient such as Spearman can be improved to 0.90 (GFN-FF: 0.39, GFN2-xTB: 0.84) reducing the probability of an incorrect sign flip in pairwise energy comparison from 32 to 7%. On the extended GMTKN55 subsets the pairwise MAD (RMSD) could be reduced on almost all subsets by up to 62% (58%) with an average improvement of 30% (29%). Moreover, an exemplary case study on vancomycin shows similar performance, indicating applicability to larger (bio)molecular structures. Furthermore, we motivate the usage of the pairwise training approach from a theoretical perspective, highlighting that while pairwise training can lead to a decline in single sample prediction of absolute energies for ML models, it significantly enhances conformer ranking performance. The data and models used in this study are available at https://github.com/grimme-lab/confrank.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":"8909-8925"},"PeriodicalIF":5.6,"publicationDate":"2024-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142680019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}