Richard E. Overstreet*, Dennis G. Thomas* and John R. Cort,
{"title":"NCAP: Noncanonical Amino Acid Parameterization Software for CHARMM Potentials","authors":"Richard E. Overstreet*, Dennis G. Thomas* and John R. Cort, ","doi":"10.1021/acs.jcim.4c0098610.1021/acs.jcim.4c00986","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c00986https://doi.org/10.1021/acs.jcim.4c00986","url":null,"abstract":"<p >Noncanonical amino acids (ncAAs) provide numerous avenues for the introduction of novel functionality to peptides and proteins. ncAAs can be incorporated through solid-phase synthesis or genetic code expansion in conjugation with heterologous expression of the encoded protein modification. Due to the difficulty of synthesis or overexpression, wide chemical space, and lack of empirically resolved structures, modeling the effects of ncAA mutation is critical for rational protein design. To evaluate the structural and functional perturbations ncAAs introduce, we utilize molecular potentials that describe the forces in the protein structure. Most potentials such as CHARMM are designed to model canonical residues but can be parametrized to include novel ncAAs. In this work, we introduce NCAP, a software package to generate CHARMM-compatible parameters from quantum chemical calculations. Unlike currently available tools, NCAP is designed to recognize the ncAA structure and automatically bridge the gap between quantum chemical calculations and CHARMM potential parameters. For our software, we discuss the workflow, validation against canonical parameter sets, and comparison with published ncAA-protein structures.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"64 24","pages":"9424–9432 9424–9432"},"PeriodicalIF":5.6,"publicationDate":"2024-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142870153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jordy Schifferstein, Andrius Bernatavicius and Antonius P.A. Janssen*,
{"title":"Docking-Informed Machine Learning for Kinome-wide Affinity Prediction","authors":"Jordy Schifferstein, Andrius Bernatavicius and Antonius P.A. Janssen*, ","doi":"10.1021/acs.jcim.4c0126010.1021/acs.jcim.4c01260","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c01260https://doi.org/10.1021/acs.jcim.4c01260","url":null,"abstract":"<p >Kinase inhibitors are an important class of anticancer drugs, with 80 inhibitors clinically approved and >100 in active clinical testing. Most bind competitively in the ATP-binding site, leading to challenges with selectivity for a specific kinase, resulting in risks for toxicity and general off-target effects. Assessing the binding of an inhibitor for the entire kinome is experimentally possible but expensive. A reliable and interpretable computational prediction of kinase selectivity would greatly benefit the inhibitor discovery and optimization process. Here, we use machine learning on docked poses to address this need. To this end, we aggregated all known inhibitor-kinase affinities and generated the complete accompanying 3D interactome by docking all inhibitors to the respective high-quality X-ray structures. We then used this resource to train a neural network as a kinase-specific scoring function, which achieved an overall performance (<i>R</i><sup>2</sup>) of 0.63–0.74 on unseen inhibitors across the kinome. The entire pipeline from molecule to 3D-based affinity prediction has been fully automated and wrapped in a freely available package. This has a graphical user interface that is tightly integrated with PyMOL to allow immediate adoption in the medicinal chemistry practice.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"64 24","pages":"9196–9204 9196–9204"},"PeriodicalIF":5.6,"publicationDate":"2024-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.acs.org/doi/epdf/10.1021/acs.jcim.4c01260","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142870152","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Influence of Data Curation and Confidence Levels on Compound Predictions Using Machine Learning Models","authors":"Elena Xerxa, Martin Vogt and Jürgen Bajorath*, ","doi":"10.1021/acs.jcim.4c0157310.1021/acs.jcim.4c01573","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c01573https://doi.org/10.1021/acs.jcim.4c01573","url":null,"abstract":"<p >While data curation principles and practices are a major topic in data science, they are often not explicitly considered in machine learning (ML) applications in chemistry. We have been interested in evaluating the potential effects of data curation on the performance of molecular ML models. Therefore, a sequential curation scheme was developed for compounds and activity data, and different ML classification models were generated at increasing data confidence levels and evaluated. Sequential data curation was found to systematically increase classification performance in an incremental manner due to cumulative effects of individual data curation criteria. The analysis of chemical space distributions of compound subsets at different data confidence levels revealed that the separation of compounds with different class labels in chemical space generally increased during sequential activity data curation, which was mostly due to subsequent elimination of singletons rather than compounds from analogue series. These findings provided a rationale for increasing the classification performance of ML models as a consequence of increasingly stringent data curation. Taken together, the results reported herein suggest that further attention should be paid to varying data curation and confidence levels when deriving and assessing ML models for chemical applications.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"64 24","pages":"9341–9349 9341–9349"},"PeriodicalIF":5.6,"publicationDate":"2024-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142874986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tibor Viktor Szalai, Dávid Bajusz, Rita Börzsei, Balázs Zoltán Zsidó, Janez Ilaš, György G Ferenczy, Csaba Hetényi, György M Keserű
{"title":"Effect of Water Networks On Ligand Binding: Computational Predictions vs Experiments.","authors":"Tibor Viktor Szalai, Dávid Bajusz, Rita Börzsei, Balázs Zoltán Zsidó, Janez Ilaš, György G Ferenczy, Csaba Hetényi, György M Keserű","doi":"10.1021/acs.jcim.4c01291","DOIUrl":"10.1021/acs.jcim.4c01291","url":null,"abstract":"<p><p>Rational drug design focuses on the explanation and prediction of complex formation between therapeutic targets and small-molecule ligands. As a third and often overlooked interacting partner, water molecules play a critical role in the thermodynamics of protein-ligand binding, impacting both the entropy and enthalpy components of the binding free energy and by extension, on-target affinity and bioactivity. The community has realized the importance of binding site waters, as evidenced by the number of computational tools to predict the structure and thermodynamics of their networks. However, quantitative experimental characterization of relevant protein-ligand-water systems, and consequently the validation of these modeling methods, remains challenging. Here, we investigated the impact of solvent exchange from light (H<sub>2</sub>O) to heavy water (D<sub>2</sub>O) to provide complete thermodynamic profiling of these ternary systems. Utilizing the solvent isotope effects, we gain a deeper understanding of the energetic contributions of various components. Specifically, we conducted isothermal titration calorimetry experiments on trypsin with a series of <i>p</i>-substituted benzamidines, as well as carbonic anhydrase II (CAII) with a series of aromatic sulfonamides. Significant differences in binding enthalpies found between light vs heavy water indicate a substantial role of the binding site water network in protein-ligand binding. Next, we challenged two conceptually distinct modeling methods, the grid-based WaterFLAP and the molecular dynamics-based MobyWat, by predicting and scoring relevant water networks. The predicted water positions accurately reproduce those in available high-resolution X-ray and neutron diffraction structures of the relevant protein-ligand complexes. Estimated energetic contributions of the identified water networks were corroborated by the experimental thermodynamics data. Besides providing a direct validation for the predictive power of these methods, our findings confirmed the importance of considering binding site water networks in computational ligand design.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":"8980-8998"},"PeriodicalIF":5.6,"publicationDate":"2024-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11632780/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142685409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lucía Morán-González, Jørn Eirik Betten, Hannes Kneiding, David Balcells
{"title":"AABBA Graph Kernel: Atom-Atom, Bond-Bond, and Bond-Atom Autocorrelations for Machine Learning.","authors":"Lucía Morán-González, Jørn Eirik Betten, Hannes Kneiding, David Balcells","doi":"10.1021/acs.jcim.4c01583","DOIUrl":"10.1021/acs.jcim.4c01583","url":null,"abstract":"<p><p>Graphs are one of the most natural and powerful representations available for molecules; natural because they have an intuitive correspondence to skeletal formulas, the language used by chemists worldwide, and powerful, because they are highly expressive both globally (molecular topology) and locally (atom and bond properties). Graph kernels are used to transform molecular graphs into fixed-length vectors, which, based on their capacity of measuring similarity, can be used as fingerprints for machine learning (ML). To date, graph kernels have mostly focused on the atomic nodes of the graph. In this work, we developed a graph kernel based on atom-atom, bond-bond, and bond-atom (AABBA) autocorrelations. The resulting vector representations were tested on regression ML tasks on a data set of transition metal complexes; a benchmark motivated by the higher complexity of these compounds relative to organic molecules. In particular, we tested different flavors of the AABBA kernel in the prediction of the energy barriers and bond distances of the Vaska's complex data set (Friederich et al., <i>Chem. Sci.</i>, 2020, <b>11,</b> 4584). For a variety of ML models, including neural networks, gradient boosting machines, and Gaussian processes, we showed that AABBA outperforms the baseline including only atom-atom autocorrelations. Dimensionality reduction studies also showed that the bond-bond and bond-atom autocorrelations yield many of the most relevant features. We believe that the AABBA graph kernel can accelerate the exploration of large chemical spaces and inspire novel molecular representations in which both atomic and bond properties play an important role.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":"8756-8769"},"PeriodicalIF":5.6,"publicationDate":"2024-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11632777/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142708590","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Application of Machine Learning in Doping Detection.","authors":"Qingqing Yang, Wennuo Xu, Xiaodong Sun, Qin Chen, Bing Niu","doi":"10.1021/acs.jcim.4c01234","DOIUrl":"10.1021/acs.jcim.4c01234","url":null,"abstract":"<p><p>Detecting doping agents in sports poses a significant challenge due to the continuous emergence of new prohibited substances and methods. Traditional detection methods primarily rely on targeted analysis, which is often labor-intensive and is susceptible to errors. In response, machine learning offers a transformative approach to enhancing doping screening and detection. With its powerful data analysis capabilities, machine learning enables the rapid identification of patterns and features in complex compound data, increasing both the efficiency and the accuracy of detection. Moreover, when integrated with nontargeted metabolomics, machine learning can predict unknown metabolites, aiding the discovery of long-lasting biomarkers of doping. It also excels in classifying novel compounds, thereby reducing false-negative rates. As instrumental analysis and machine learning technologies continue to advance, the development of rapid, scalable, and highly efficient doping detection methods becomes increasingly feasible, supporting the pursuit of fairness and integrity in sports competitions.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":"8673-8683"},"PeriodicalIF":5.6,"publicationDate":"2024-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142685417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Li Liang, Yunxin Duan, Chen Zeng, Boheng Wan, Huifeng Yao, Haichun Liu, Tao Lu, Yanmin Zhang, Yadong Chen, Jun Shen
{"title":"CPIScore: A Deep Learning Approach for Rapid Scoring and Interpretation of Protein-Ligand Binding Interactions.","authors":"Li Liang, Yunxin Duan, Chen Zeng, Boheng Wan, Huifeng Yao, Haichun Liu, Tao Lu, Yanmin Zhang, Yadong Chen, Jun Shen","doi":"10.1021/acs.jcim.4c01175","DOIUrl":"10.1021/acs.jcim.4c01175","url":null,"abstract":"<p><p>Protein-ligand binding affinity prediction is a crucial and challenging task in the field of drug discovery. However, traditional simulation-based computational approaches are often prohibitively time-consuming, limiting their practical utility. In this study, we introduce a novel deep learning method, CPIScore, which leverages the capabilities of Transformer and Graph Convolutional Networks (GCN) to enhance the prediction of protein-ligand binding affinity. CPIScore utilizes the Transformer architecture to capture comprehensive global contexts of protein and ligand sequences, while the GCN component effectively extracts local features from small molecular graphs. Our results demonstrate that CPIScore surpasses both traditional machine learning and other deep learning models in accuracy, achieving a Pearson's <i>r</i> of 0.74 on our test set. Furthermore, CPIScore has been validated across multiple targets, proving its ability to discern inhibitors from a diverse compound library with high enrichment rates. Notably, when applied to a generated focused library of compounds, CPIScore successfully identified six potent small-molecule inhibitors of ATR, which were tested experimentally and four small molecules exhibited inhibitory activity below ten nanomoles. These results highlight CPIScore's potential to significantly streamline and enhance the efficiency of drug discovery processes.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":"8809-8823"},"PeriodicalIF":5.6,"publicationDate":"2024-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142674418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Matini-Net: Versatile Material Informatics Research Framework for Feature Engineering and Deep Neural Network Design.","authors":"Myeonghun Lee, Taehyun Park, Kyoungmin Min","doi":"10.1021/acs.jcim.4c01676","DOIUrl":"10.1021/acs.jcim.4c01676","url":null,"abstract":"<p><p>In this study, we introduced Matini-Net, which is a versatile framework for feature engineering and automated architecture design for materials informatics research using deep neural networks. Matini-Net provides the flexibility to design feature-based, graph-based, and combinations of these models, accommodating both single- and multimodal model architectures. For validation, we performed a performance evaluation on the MatBench benchmarking dataset of five properties, targeting five types of regression architectures that can be designed using Matini-Net. When applied to each of the five material property datasets, the best model performance for the various architectures exhibited <i>R</i><sup>2</sup> > 0.84. This highlights the usefulness and flexibility of Matini-Net for accelerating materials discovery. Specifically, this framework was developed for researchers with limited experience in deep learning to easily apply it to research through automated feature engineering, hyperparameter tuning, and network construction. Moreover, Matini-Net improves the model interpretability by performing an importance analysis of the selected features. We believe that by employing Matini-Net, machine and deep learning can be applied more easily and effectively in various types of materials research.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":"8770-8783"},"PeriodicalIF":5.6,"publicationDate":"2024-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142680026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ian Dunn*, Somayeh Pirhadi, Yao Wang, Smmrithi Ravindran, Carter Concepcion and David Ryan Koes*,
{"title":"CACHE Challenge #1: Docking with GNINA Is All You Need","authors":"Ian Dunn*, Somayeh Pirhadi, Yao Wang, Smmrithi Ravindran, Carter Concepcion and David Ryan Koes*, ","doi":"10.1021/acs.jcim.4c0142910.1021/acs.jcim.4c01429","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c01429https://doi.org/10.1021/acs.jcim.4c01429","url":null,"abstract":"<p >We describe our winning submission to the first Critical Assessment of Computational Hit-Finding Experiments (CACHE) challenge. In this challenge, 23 participants employed a diverse array of structure-based methods to identify hits to a target with no known ligands. We utilized two methods, pharmacophore search and molecular docking, to identify our initial hit list and compounds for the hit expansion phase. Unlike many other participants, we limited ourselves to using docking scores in identifying and ranking hits. Our resulting best hit series tied for first place when evaluated by a panel of expert judges. Here, we report our top-performing open-source workflow and results.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"64 24","pages":"9388–9396 9388–9396"},"PeriodicalIF":5.6,"publicationDate":"2024-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.acs.org/doi/epdf/10.1021/acs.jcim.4c01429","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142870131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jinyong Park, Minhi Han, Kiwoong Lee, Sungnam Park
{"title":"Hierarchical Graph Attention Network with Positive and Negative Attentions for Improved Interpretability: ISA-PN.","authors":"Jinyong Park, Minhi Han, Kiwoong Lee, Sungnam Park","doi":"10.1021/acs.jcim.4c01035","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c01035","url":null,"abstract":"<p><p>With the advancement of deep learning (DL) methods in chemistry and materials science, the interpretability of DL models has become a critical issue in elucidating quantitative (molecular) structure-property relationships. Although attention mechanisms have been generally employed to explain the importance of molecular substructures that contribute to molecular properties, their interpretability remains limited. In this work, we introduce a versatile segmentation method and develop an interpretable subgraph attention (ISA) network with positive and negative streams (ISA-PN) to enhance the understanding of molecular structure-property relationships. The predictive performance of the ISA models was validated using data sets for aqueous solubility, lipophilicity, and melting temperature, with a particular focus on evaluating interpretability for the aqueous solubility data set. The ISA-PN model enables the quantification of the contributions of molecular substructures through positive and negative attention scores. Comparative analyses of the ISA, ISA-PN, and GC-Net (group contribution network) models demonstrate that the ISA-PN model significantly improves interpretability while maintaining similar accuracy levels. This study highlights the efficacy of the ISA-PN model in providing meaningful insights into the contributions of molecular substructures to molecular properties, thereby enhancing the interpretability of DL models in chemical applications.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142798647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}