{"title":"Data-Driven Insights into Porphyrin Geometry: Interpretable AI for Non-Planarity and Aromaticity Analyses.","authors":"Shachar Fite,Zeev Gross","doi":"10.1021/acs.jcim.5c00518","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c00518","url":null,"abstract":"Porphyrins are involved in numerous and very different chemical and biological processes, due to the sensitivity of their application-relevant properties to subtle structural changes. Applying modern machine learning methodology is very appealing for discovering structure-activity relationships that can be used for design of tailor-made porphyrins for specific purposes. For achieving this goal, a high-quality set consisting of 425 metal porphyrins was established via curation of 7590 porphyrin structures from the Cambridge crystallographic database. Using data-driven techniques for analyzing nonplanarity and \"structural aromaticity\" allowed for validation of common knowledge in the field as well as discovery of new relations. Aromaticity was found to be influenced differently by distinct nonplanar distortions. Nonplanarity is more sensitive to macrocycle substitutions than to metal or axial ligand effects, while ruffled distortions are dominated by axial ligand size and metal properties. These findings offer new insights into structure-property relationships in porphyrins, providing a data-driven foundation for targeted synthesis to tune aromaticity and nonplanarity. Despite data set limitations, this work demonstrates the value of machine learning in uncovering complex chemical trends.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"11 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143857338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rui Pedro Ribeiro,Jonas Goßen,Giulia Rossetti,Alejandro Giorgetti
{"title":"Structural Systems Biology Toolkit (SSBtoolkit): From Molecular Structure to Subcellular Signaling Pathways.","authors":"Rui Pedro Ribeiro,Jonas Goßen,Giulia Rossetti,Alejandro Giorgetti","doi":"10.1021/acs.jcim.5c00165","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c00165","url":null,"abstract":"Here, we introduce the Structural Systems Biology (SSB) toolkit, a Python library that integrates structural macromolecular data with systems biology simulations to model signal-transduction pathways of G-protein-coupled receptors (GPCRs). Our framework streamlines simulation and analysis of the mathematical models of GPCRs cellular pathways, facilitating the exploration of the signal-transduction kinetics induced by ligand-GPCR interactions: the dose-response of the ligand can be modeled, along with the corresponding change in the concentration of other signaling molecular species over time, like for instance [Ca2+] or [cAMP]. SSB toolkit brings to light the possibility of easily investigating the subcellular effects of ligand binding on receptor activation, even in the presence of genetic mutations, thereby enhancing our understanding of the intricate relationship between ligand-target interactions at the molecular level and the higher-level cellular and (patho)physiological response mechanisms.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"31 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143849384","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"QM/MM Study of the Metabolic Oxidation of 6',7'-Dihydroxybergamottin Catalyzed by Human CYP3A4: Preferential Formation of the γ-Ketoenal Product in Mechanism-Based Inactivation.","authors":"Junfang Yan,Hajime Hirao","doi":"10.1021/acs.jcim.5c00259","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c00259","url":null,"abstract":"6',7'-Dihydroxybergamottin (DHB), a natural furanocoumarin found in grapefruit, is known to cause mechanism-based inactivation (MBI) of several cytochrome P450 enzymes (P450s) in humans, including CYP3A4. Despite its pharmacological significance, the precise microscopic mechanisms underlying the P450 MBI induced by DHB remain unclear. To address this, we employed molecular docking and molecular dynamics simulations to identify a plausible catalytic binding pose of DHB within CYP3A4. Subsequent quantum mechanics/molecular mechanics (QM/MM) calculations explored two possible reaction pathways (A and B). Path A involves the attack by compound I (Cpd I) at the C5 position of the furan moiety, leading to γ-ketoenal formation, while Path B targets the C4 position, yielding an epoxide. Path A exhibits a much lower activation energy barrier, indicating a strong kinetic preference. Additionally, the γ-ketoenal is thermodynamically more stable than the epoxide. Thus, even if the epoxide forms initially, it is likely to rearrange into the γ-ketoenal, either within the enzyme or in aqueous solution. Collectively, these findings suggest that the γ-ketoenal is the sole ultimate product of DHB oxidation by CYP3A4. This study provides valuable insights into CYP3A4 inactivation by grapefruit constituents and advances our understanding of food-drug interactions.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"17 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143851033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Ligand-Based Drug Discovery Leveraging State-of-the-Art Machine Learning Methodologies Exemplified by Cdr1 Inhibitor Prediction.","authors":"The-Chuong Trinh,Pierre Falson,Viet-Khoa Tran-Nguyen,Ahcène Boumendjel","doi":"10.1021/acs.jcim.5c00374","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c00374","url":null,"abstract":"Artificial intelligence (AI) is revolutionizing drug discovery with unprecedented speed and efficiency. In computer-aided drug design, structure-based and ligand-based methodologies are the main driving forces for innovation. In cases where no experimental structure or high-confidence homology/AlphaFold-predicted model of the target is available in 3D, ligand-based strategies are generally preferable. Here, we aim to develop and evaluate new predictive AI models for ligand-based drug discovery. To illustrate our workflow, we propose, as an example, an ensemble classification model for Cdr1 inhibitor prediction. We leverage target-specific experimental data from different sources, various molecular feature types, and multiple state-of-the-art machine learning (ML) algorithms alongside a multi-instance 3D graph neural network (multiple conformations of a single molecule are considered). Bayesian hyperparameter tuning, stacked generalization, and soft voting are involved in our workflow. The final target-specific ensemble model benefits from the classification and screening power of those constituting it. On an external test set structurally dissimilar to the training data, its average precision is 0.755, its F1-score is 0.714, the area under the receiver operating characteristic curve is 0.884, and the balanced accuracy is 0.799. It gives a low false positive rate of 0.1236 on another test set outside the training chemical space, indicating its ability to avoid false positives. The present work highlights the potential of stacking ensemble ML and offers a rigorous general workflow to build ligand-based predictive AI models for other targets.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"36 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143846330","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Maximilian Fleck, Samir Darouich, Jürgen Pleiss, Niels Hansen and Marcelle B. M. Spera*,
{"title":"Physics-Informed Multifidelity Gaussian Process: Modeling the Effect of Water and Temperature on the Viscosity of a Deep Eutectic Solvent","authors":"Maximilian Fleck, Samir Darouich, Jürgen Pleiss, Niels Hansen and Marcelle B. M. Spera*, ","doi":"10.1021/acs.jcim.5c0015710.1021/acs.jcim.5c00157","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c00157https://doi.org/10.1021/acs.jcim.5c00157","url":null,"abstract":"<p >Knowledge of shear viscosity as function of temperature and composition of an aqueous deep eutectic solvent mixture is essential for process design but can be highly challenging and costly to measure. The present work proposes to combine a small set of experimentally determined viscosities with a small set of simulated values within a linear multifidelity approach to predict the dependency of shear viscosity on temperature and composition. This method provides a simple approach that requires a physics-based transformation of viscosity data prior to training, without the need for additional data such as densities. This allows reduction in cost with experiments and reduces the number of experiments and simulations required to characterize a specific system. The data-driven component of the model does not concern the viscosity itself but rather the excess free energy term within the framework of a mixture viscosity model according to Eyring’s absolute rate theory. Moreover, we illustrate the application of kernel-based machine learning approaches to daily research questions where data availability is limited compared to the data set size typically required for neural networks.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"65 8","pages":"3999–4009 3999–4009"},"PeriodicalIF":5.6,"publicationDate":"2025-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143878263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Maximilian Fleck,Samir Darouich,Jürgen Pleiss,Niels Hansen,Marcelle B M Spera
{"title":"Physics-Informed Multifidelity Gaussian Process: Modeling the Effect of Water and Temperature on the Viscosity of a Deep Eutectic Solvent.","authors":"Maximilian Fleck,Samir Darouich,Jürgen Pleiss,Niels Hansen,Marcelle B M Spera","doi":"10.1021/acs.jcim.5c00157","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c00157","url":null,"abstract":"Knowledge of shear viscosity as function of temperature and composition of an aqueous deep eutectic solvent mixture is essential for process design but can be highly challenging and costly to measure. The present work proposes to combine a small set of experimentally determined viscosities with a small set of simulated values within a linear multifidelity approach to predict the dependency of shear viscosity on temperature and composition. This method provides a simple approach that requires a physics-based transformation of viscosity data prior to training, without the need for additional data such as densities. This allows reduction in cost with experiments and reduces the number of experiments and simulations required to characterize a specific system. The data-driven component of the model does not concern the viscosity itself but rather the excess free energy term within the framework of a mixture viscosity model according to Eyring's absolute rate theory. Moreover, we illustrate the application of kernel-based machine learning approaches to daily research questions where data availability is limited compared to the data set size typically required for neural networks.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"45 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143846332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Emily H. Chaisson, Frederick A. Heberle* and Milka Doktorova*,
{"title":"Quantifying Acyl Chain Interdigitation in Simulated Bilayers via Direct Transbilayer Interactions","authors":"Emily H. Chaisson, Frederick A. Heberle* and Milka Doktorova*, ","doi":"10.1021/acs.jcim.4c0228710.1021/acs.jcim.4c02287","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c02287https://doi.org/10.1021/acs.jcim.4c02287","url":null,"abstract":"<p >In a lipid bilayer, the interactions between the lipid hydrocarbon chains from opposing leaflets can influence membrane properties. These interactions include the phenomenon of interdigitation, in which an acyl chain of one leaflet extends past the bilayer midplane and into the opposing leaflet. While static interdigitation is well understood in gel-phase bilayers from X-ray diffraction measurements, much less is known about dynamic interdigitation in fluid phases. In this regard, atomistic molecular dynamics simulations can provide mechanistic information on interleaflet interactions that can be used to generate experimentally testable hypotheses. To address limitations of existing computational methodologies that provide results that are either indirect or averaged over time and space, here we introduce three novel ways of quantifying the extent of chain interdigitation. Our protocols include the analysis of instantaneous interactions at the level of individual carbon atoms, thus providing temporal and spatial resolution for a more nuanced picture of dynamic interdigitation. We compare the methods on bilayers composed of lipids with an equal total number of carbon atoms, but different mismatches between the <i>sn</i>-1 and <i>sn</i>-2 chain lengths. We find that these metrics, which are based on freely available software packages and are easy to implement, provide complementary details that help characterize various features of lipid–lipid contacts at the bilayer midplane. The new frameworks thus allow for a deeper look at fundamental molecular mechanisms underlying bilayer structure and dynamics and present a valuable expansion of the membrane biophysics toolkit.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"65 8","pages":"3879–3885 3879–3885"},"PeriodicalIF":5.6,"publicationDate":"2025-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.acs.org/doi/epdf/10.1021/acs.jcim.4c02287","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143878260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The-Chuong Trinh, Pierre Falson, Viet-Khoa Tran-Nguyen* and Ahcène Boumendjel*,
{"title":"Ligand-Based Drug Discovery Leveraging State-of-the-Art Machine Learning Methodologies Exemplified by Cdr1 Inhibitor Prediction","authors":"The-Chuong Trinh, Pierre Falson, Viet-Khoa Tran-Nguyen* and Ahcène Boumendjel*, ","doi":"10.1021/acs.jcim.5c0037410.1021/acs.jcim.5c00374","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c00374https://doi.org/10.1021/acs.jcim.5c00374","url":null,"abstract":"<p >Artificial intelligence (AI) is revolutionizing drug discovery with unprecedented speed and efficiency. In computer-aided drug design, structure-based and ligand-based methodologies are the main driving forces for innovation. In cases where no experimental structure or high-confidence homology/AlphaFold-predicted model of the target is available in 3D, ligand-based strategies are generally preferable. Here, we aim to develop and evaluate new predictive AI models for ligand-based drug discovery. To illustrate our workflow, we propose, as an example, an ensemble classification model for Cdr1 inhibitor prediction. We leverage target-specific experimental data from different sources, various molecular feature types, and multiple state-of-the-art machine learning (ML) algorithms alongside a multi-instance 3D graph neural network (multiple conformations of a single molecule are considered). Bayesian hyperparameter tuning, stacked generalization, and soft voting are involved in our workflow. The final target-specific ensemble model benefits from the classification and screening power of those constituting it. On an external test set structurally dissimilar to the training data, its average precision is 0.755, its F1-score is 0.714, the area under the receiver operating characteristic curve is 0.884, and the balanced accuracy is 0.799. It gives a low false positive rate of 0.1236 on another test set outside the training chemical space, indicating its ability to avoid false positives. The present work highlights the potential of stacking ensemble ML and offers a rigorous general workflow to build ligand-based predictive AI models for other targets.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"65 8","pages":"4027–4042 4027–4042"},"PeriodicalIF":5.6,"publicationDate":"2025-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143878297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Addressing Imbalanced Classification Problems in Drug Discovery and Development Using Random Forest, Support Vector Machine, AutoGluon-Tabular, and H2O AutoML.","authors":"Ayush Garg,Narayanan Ramamurthi,Shyam Sundar Das","doi":"10.1021/acs.jcim.5c00023","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c00023","url":null,"abstract":"The classification models built on class imbalanced data sets tend to prioritize the accuracy of the majority class, and thus, the minority class generally has a higher misclassification rate. Different techniques are available to address the class imbalance in classification models and can be categorized as data-level, algorithm-level, and hybrid methods. But to the best of our knowledge, an in-depth analysis of the performance of these techniques against the class ratio is not available in the literature. We have addressed these shortcomings in this study and have performed a detailed analysis of the performance of four different techniques to address imbalanced class distribution using machine learning (ML) methods and AutoML tools. To carry out our study, we have selected four such techniques─(a) threshold optimization using (i) GHOST and (ii) the area under the precision-recall curve (AUPR) curve, (b) internal balancing method of AutoML and class-weight of machine learning methods, and (c) data balancing using SMOTETomek─and generated 27 data sets considering nine different class ratios (i.e., the ratio of the positive class and total samples) from three data sets that belong to the drug discovery and development field. We have employed random forest (RF) and support vector machine (SVM) as representatives of ML classifier and AutoGluon-Tabular (version 0.6.1) and H2O AutoML (version 3.40.0.4) as representatives of AutoML tools. The important findings of our studies are as follows: (i) there is no effect of threshold optimization on ranking metrics such as AUC and AUPR, but AUC and AUPR get affected by class-weighting and SMOTTomek; (ii) for ML methods RF and SVM, significant percentage improvement up to 375, 33.33, and 450 over all the data sets can be achieved, respectively, for F1 score, MCC, and balanced accuracy, which are suitable for performance evaluation of imbalanced data sets; (iii) for AutoML libraries AutoGluon-Tabular and H2O AutoML, significant percentage improvement up to 383.33, 37.25, and 533.33 over all the data sets can be achieved, respectively, for F1 score, MCC, and balanced accuracy; (iv) the general pattern of percentage improvement in balanced accuracy is that the percentage improvement increases when the class ratio is systematically decreased from 0.5 to 0.1; in the case of F1 score and MCC, maximum improvement is achieved at the class ratio of 0.3; (v) for both ML and AutoML with balancing, it is observed that any individual class-balancing technique does not outperform all other methods on a significantly higher number of data sets based on F1 score; (vi) the three external balancing techniques combined outperformed the internal balancing methods of the ML and AutoML; (vii) AutoML tools perform as good as the ML models and in some cases perform even better for handling imbalanced classification when applied with imbalance handling techniques. In summary, exploration of multiple data balancing techniques is recom","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"50 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143836483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dmitriy M. Makarov*, Nikolai N. Kalikin, Yury A. Budkov*, Pavel Gurikov, Sergey E. Kruchinin, Abolghasem Jouyban and Michael G. Kiselev,
{"title":"Improved Solubility Predictions in scCO2 Using Thermodynamics-Informed Machine Learning Models","authors":"Dmitriy M. Makarov*, Nikolai N. Kalikin, Yury A. Budkov*, Pavel Gurikov, Sergey E. Kruchinin, Abolghasem Jouyban and Michael G. Kiselev, ","doi":"10.1021/acs.jcim.5c0043210.1021/acs.jcim.5c00432","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c00432https://doi.org/10.1021/acs.jcim.5c00432","url":null,"abstract":"<p >Accurate solubility prediction in supercritical carbon dioxide (scCO<sub>2</sub>) is crucial for optimizing experimental design by eliminating unnecessary and costly trials at an early stage, thereby streamlining the workflow. A comprehensive solubility database containing 31,975 records has been compiled, providing a foundation for developing predictive models applicable to a diverse class of chemical compounds, with a particular focus on drug-like substances. In this study, we propose a domain-aware machine learning approach that incorporates thermodynamic properties governing phase transitions to solubility predictions in scCO<sub>2</sub>. Predictive models were developed using the CatBoost algorithm and a graph-based architecture employing directed message passing to identify the most effective approach. Furthermore, auxiliary properties of the solute, including melting point, critical parameters, enthalpy of vaporization, and Gibbs free energy of solvation, were predicted as part of this work. The findings underscore the efficacy of incorporating domain-specific thermodynamic features to enhance the predictive accuracy of scCO<sub>2</sub> solubility modeling. The interpretation and the applicability domain assessment have confirmed the qualitative selection of the employed descriptors, demonstrating their ability to generalize to unique compounds that fall outside the defined domain.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"65 8","pages":"4043–4056 4043–4056"},"PeriodicalIF":5.6,"publicationDate":"2025-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143878271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}