Journal of Chemical Information and Modeling 最新文献

筛选
英文 中文
Data-Driven Insights into Porphyrin Geometry: Interpretable AI for Non-Planarity and Aromaticity Analyses. 数据驱动的卟啉几何洞察:非平面性和芳香性分析的可解释人工智能。
IF 5.6 2区 化学
Journal of Chemical Information and Modeling Pub Date : 2025-04-20 DOI: 10.1021/acs.jcim.5c00518
Shachar Fite,Zeev Gross
{"title":"Data-Driven Insights into Porphyrin Geometry: Interpretable AI for Non-Planarity and Aromaticity Analyses.","authors":"Shachar Fite,Zeev Gross","doi":"10.1021/acs.jcim.5c00518","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c00518","url":null,"abstract":"Porphyrins are involved in numerous and very different chemical and biological processes, due to the sensitivity of their application-relevant properties to subtle structural changes. Applying modern machine learning methodology is very appealing for discovering structure-activity relationships that can be used for design of tailor-made porphyrins for specific purposes. For achieving this goal, a high-quality set consisting of 425 metal porphyrins was established via curation of 7590 porphyrin structures from the Cambridge crystallographic database. Using data-driven techniques for analyzing nonplanarity and \"structural aromaticity\" allowed for validation of common knowledge in the field as well as discovery of new relations. Aromaticity was found to be influenced differently by distinct nonplanar distortions. Nonplanarity is more sensitive to macrocycle substitutions than to metal or axial ligand effects, while ruffled distortions are dominated by axial ligand size and metal properties. These findings offer new insights into structure-property relationships in porphyrins, providing a data-driven foundation for targeted synthesis to tune aromaticity and nonplanarity. Despite data set limitations, this work demonstrates the value of machine learning in uncovering complex chemical trends.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"11 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143857338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Structural Systems Biology Toolkit (SSBtoolkit): From Molecular Structure to Subcellular Signaling Pathways. 结构系统生物学工具包(SSBtoolkit):从分子结构到亚细胞信号通路。
IF 5.6 2区 化学
Journal of Chemical Information and Modeling Pub Date : 2025-04-18 DOI: 10.1021/acs.jcim.5c00165
Rui Pedro Ribeiro,Jonas Goßen,Giulia Rossetti,Alejandro Giorgetti
{"title":"Structural Systems Biology Toolkit (SSBtoolkit): From Molecular Structure to Subcellular Signaling Pathways.","authors":"Rui Pedro Ribeiro,Jonas Goßen,Giulia Rossetti,Alejandro Giorgetti","doi":"10.1021/acs.jcim.5c00165","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c00165","url":null,"abstract":"Here, we introduce the Structural Systems Biology (SSB) toolkit, a Python library that integrates structural macromolecular data with systems biology simulations to model signal-transduction pathways of G-protein-coupled receptors (GPCRs). Our framework streamlines simulation and analysis of the mathematical models of GPCRs cellular pathways, facilitating the exploration of the signal-transduction kinetics induced by ligand-GPCR interactions: the dose-response of the ligand can be modeled, along with the corresponding change in the concentration of other signaling molecular species over time, like for instance [Ca2+] or [cAMP]. SSB toolkit brings to light the possibility of easily investigating the subcellular effects of ligand binding on receptor activation, even in the presence of genetic mutations, thereby enhancing our understanding of the intricate relationship between ligand-target interactions at the molecular level and the higher-level cellular and (patho)physiological response mechanisms.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"31 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143849384","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
QM/MM Study of the Metabolic Oxidation of 6',7'-Dihydroxybergamottin Catalyzed by Human CYP3A4: Preferential Formation of the γ-Ketoenal Product in Mechanism-Based Inactivation. 人CYP3A4催化6′,7′-二羟基佛手柑素代谢氧化的QM/MM研究:γ-酮醛产物在机制失活中的优先形成。
IF 5.6 2区 化学
Journal of Chemical Information and Modeling Pub Date : 2025-04-18 DOI: 10.1021/acs.jcim.5c00259
Junfang Yan,Hajime Hirao
{"title":"QM/MM Study of the Metabolic Oxidation of 6',7'-Dihydroxybergamottin Catalyzed by Human CYP3A4: Preferential Formation of the γ-Ketoenal Product in Mechanism-Based Inactivation.","authors":"Junfang Yan,Hajime Hirao","doi":"10.1021/acs.jcim.5c00259","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c00259","url":null,"abstract":"6',7'-Dihydroxybergamottin (DHB), a natural furanocoumarin found in grapefruit, is known to cause mechanism-based inactivation (MBI) of several cytochrome P450 enzymes (P450s) in humans, including CYP3A4. Despite its pharmacological significance, the precise microscopic mechanisms underlying the P450 MBI induced by DHB remain unclear. To address this, we employed molecular docking and molecular dynamics simulations to identify a plausible catalytic binding pose of DHB within CYP3A4. Subsequent quantum mechanics/molecular mechanics (QM/MM) calculations explored two possible reaction pathways (A and B). Path A involves the attack by compound I (Cpd I) at the C5 position of the furan moiety, leading to γ-ketoenal formation, while Path B targets the C4 position, yielding an epoxide. Path A exhibits a much lower activation energy barrier, indicating a strong kinetic preference. Additionally, the γ-ketoenal is thermodynamically more stable than the epoxide. Thus, even if the epoxide forms initially, it is likely to rearrange into the γ-ketoenal, either within the enzyme or in aqueous solution. Collectively, these findings suggest that the γ-ketoenal is the sole ultimate product of DHB oxidation by CYP3A4. This study provides valuable insights into CYP3A4 inactivation by grapefruit constituents and advances our understanding of food-drug interactions.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"17 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143851033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Ligand-Based Drug Discovery Leveraging State-of-the-Art Machine Learning Methodologies Exemplified by Cdr1 Inhibitor Prediction. 基于配体的药物发现利用最先进的机器学习方法,以Cdr1抑制剂预测为例。
IF 5.6 2区 化学
Journal of Chemical Information and Modeling Pub Date : 2025-04-16 DOI: 10.1021/acs.jcim.5c00374
The-Chuong Trinh,Pierre Falson,Viet-Khoa Tran-Nguyen,Ahcène Boumendjel
{"title":"Ligand-Based Drug Discovery Leveraging State-of-the-Art Machine Learning Methodologies Exemplified by Cdr1 Inhibitor Prediction.","authors":"The-Chuong Trinh,Pierre Falson,Viet-Khoa Tran-Nguyen,Ahcène Boumendjel","doi":"10.1021/acs.jcim.5c00374","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c00374","url":null,"abstract":"Artificial intelligence (AI) is revolutionizing drug discovery with unprecedented speed and efficiency. In computer-aided drug design, structure-based and ligand-based methodologies are the main driving forces for innovation. In cases where no experimental structure or high-confidence homology/AlphaFold-predicted model of the target is available in 3D, ligand-based strategies are generally preferable. Here, we aim to develop and evaluate new predictive AI models for ligand-based drug discovery. To illustrate our workflow, we propose, as an example, an ensemble classification model for Cdr1 inhibitor prediction. We leverage target-specific experimental data from different sources, various molecular feature types, and multiple state-of-the-art machine learning (ML) algorithms alongside a multi-instance 3D graph neural network (multiple conformations of a single molecule are considered). Bayesian hyperparameter tuning, stacked generalization, and soft voting are involved in our workflow. The final target-specific ensemble model benefits from the classification and screening power of those constituting it. On an external test set structurally dissimilar to the training data, its average precision is 0.755, its F1-score is 0.714, the area under the receiver operating characteristic curve is 0.884, and the balanced accuracy is 0.799. It gives a low false positive rate of 0.1236 on another test set outside the training chemical space, indicating its ability to avoid false positives. The present work highlights the potential of stacking ensemble ML and offers a rigorous general workflow to build ligand-based predictive AI models for other targets.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"36 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143846330","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Physics-Informed Multifidelity Gaussian Process: Modeling the Effect of Water and Temperature on the Viscosity of a Deep Eutectic Solvent 物理信息多保真高斯过程:模拟水和温度对深共晶溶剂粘度的影响
IF 5.6 2区 化学
Journal of Chemical Information and Modeling Pub Date : 2025-04-16 DOI: 10.1021/acs.jcim.5c0015710.1021/acs.jcim.5c00157
Maximilian Fleck, Samir Darouich, Jürgen Pleiss, Niels Hansen and Marcelle B. M. Spera*, 
{"title":"Physics-Informed Multifidelity Gaussian Process: Modeling the Effect of Water and Temperature on the Viscosity of a Deep Eutectic Solvent","authors":"Maximilian Fleck,&nbsp;Samir Darouich,&nbsp;Jürgen Pleiss,&nbsp;Niels Hansen and Marcelle B. M. Spera*,&nbsp;","doi":"10.1021/acs.jcim.5c0015710.1021/acs.jcim.5c00157","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c00157https://doi.org/10.1021/acs.jcim.5c00157","url":null,"abstract":"<p >Knowledge of shear viscosity as function of temperature and composition of an aqueous deep eutectic solvent mixture is essential for process design but can be highly challenging and costly to measure. The present work proposes to combine a small set of experimentally determined viscosities with a small set of simulated values within a linear multifidelity approach to predict the dependency of shear viscosity on temperature and composition. This method provides a simple approach that requires a physics-based transformation of viscosity data prior to training, without the need for additional data such as densities. This allows reduction in cost with experiments and reduces the number of experiments and simulations required to characterize a specific system. The data-driven component of the model does not concern the viscosity itself but rather the excess free energy term within the framework of a mixture viscosity model according to Eyring’s absolute rate theory. Moreover, we illustrate the application of kernel-based machine learning approaches to daily research questions where data availability is limited compared to the data set size typically required for neural networks.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"65 8","pages":"3999–4009 3999–4009"},"PeriodicalIF":5.6,"publicationDate":"2025-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143878263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Physics-Informed Multifidelity Gaussian Process: Modeling the Effect of Water and Temperature on the Viscosity of a Deep Eutectic Solvent. 物理信息多保真高斯过程:模拟水和温度对深共晶溶剂粘度的影响。
IF 5.6 2区 化学
Journal of Chemical Information and Modeling Pub Date : 2025-04-16 DOI: 10.1021/acs.jcim.5c00157
Maximilian Fleck,Samir Darouich,Jürgen Pleiss,Niels Hansen,Marcelle B M Spera
{"title":"Physics-Informed Multifidelity Gaussian Process: Modeling the Effect of Water and Temperature on the Viscosity of a Deep Eutectic Solvent.","authors":"Maximilian Fleck,Samir Darouich,Jürgen Pleiss,Niels Hansen,Marcelle B M Spera","doi":"10.1021/acs.jcim.5c00157","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c00157","url":null,"abstract":"Knowledge of shear viscosity as function of temperature and composition of an aqueous deep eutectic solvent mixture is essential for process design but can be highly challenging and costly to measure. The present work proposes to combine a small set of experimentally determined viscosities with a small set of simulated values within a linear multifidelity approach to predict the dependency of shear viscosity on temperature and composition. This method provides a simple approach that requires a physics-based transformation of viscosity data prior to training, without the need for additional data such as densities. This allows reduction in cost with experiments and reduces the number of experiments and simulations required to characterize a specific system. The data-driven component of the model does not concern the viscosity itself but rather the excess free energy term within the framework of a mixture viscosity model according to Eyring's absolute rate theory. Moreover, we illustrate the application of kernel-based machine learning approaches to daily research questions where data availability is limited compared to the data set size typically required for neural networks.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"45 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143846332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Quantifying Acyl Chain Interdigitation in Simulated Bilayers via Direct Transbilayer Interactions 通过直接跨双分子层相互作用定量模拟双分子层中酰基链交叉化
IF 5.6 2区 化学
Journal of Chemical Information and Modeling Pub Date : 2025-04-16 DOI: 10.1021/acs.jcim.4c0228710.1021/acs.jcim.4c02287
Emily H. Chaisson, Frederick A. Heberle* and Milka Doktorova*, 
{"title":"Quantifying Acyl Chain Interdigitation in Simulated Bilayers via Direct Transbilayer Interactions","authors":"Emily H. Chaisson,&nbsp;Frederick A. Heberle* and Milka Doktorova*,&nbsp;","doi":"10.1021/acs.jcim.4c0228710.1021/acs.jcim.4c02287","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c02287https://doi.org/10.1021/acs.jcim.4c02287","url":null,"abstract":"<p >In a lipid bilayer, the interactions between the lipid hydrocarbon chains from opposing leaflets can influence membrane properties. These interactions include the phenomenon of interdigitation, in which an acyl chain of one leaflet extends past the bilayer midplane and into the opposing leaflet. While static interdigitation is well understood in gel-phase bilayers from X-ray diffraction measurements, much less is known about dynamic interdigitation in fluid phases. In this regard, atomistic molecular dynamics simulations can provide mechanistic information on interleaflet interactions that can be used to generate experimentally testable hypotheses. To address limitations of existing computational methodologies that provide results that are either indirect or averaged over time and space, here we introduce three novel ways of quantifying the extent of chain interdigitation. Our protocols include the analysis of instantaneous interactions at the level of individual carbon atoms, thus providing temporal and spatial resolution for a more nuanced picture of dynamic interdigitation. We compare the methods on bilayers composed of lipids with an equal total number of carbon atoms, but different mismatches between the <i>sn</i>-1 and <i>sn</i>-2 chain lengths. We find that these metrics, which are based on freely available software packages and are easy to implement, provide complementary details that help characterize various features of lipid–lipid contacts at the bilayer midplane. The new frameworks thus allow for a deeper look at fundamental molecular mechanisms underlying bilayer structure and dynamics and present a valuable expansion of the membrane biophysics toolkit.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"65 8","pages":"3879–3885 3879–3885"},"PeriodicalIF":5.6,"publicationDate":"2025-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.acs.org/doi/epdf/10.1021/acs.jcim.4c02287","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143878260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Ligand-Based Drug Discovery Leveraging State-of-the-Art Machine Learning Methodologies Exemplified by Cdr1 Inhibitor Prediction 基于配体的药物发现利用最先进的机器学习方法,以Cdr1抑制剂预测为例
IF 5.6 2区 化学
Journal of Chemical Information and Modeling Pub Date : 2025-04-16 DOI: 10.1021/acs.jcim.5c0037410.1021/acs.jcim.5c00374
The-Chuong Trinh, Pierre Falson, Viet-Khoa Tran-Nguyen* and Ahcène Boumendjel*, 
{"title":"Ligand-Based Drug Discovery Leveraging State-of-the-Art Machine Learning Methodologies Exemplified by Cdr1 Inhibitor Prediction","authors":"The-Chuong Trinh,&nbsp;Pierre Falson,&nbsp;Viet-Khoa Tran-Nguyen* and Ahcène Boumendjel*,&nbsp;","doi":"10.1021/acs.jcim.5c0037410.1021/acs.jcim.5c00374","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c00374https://doi.org/10.1021/acs.jcim.5c00374","url":null,"abstract":"<p >Artificial intelligence (AI) is revolutionizing drug discovery with unprecedented speed and efficiency. In computer-aided drug design, structure-based and ligand-based methodologies are the main driving forces for innovation. In cases where no experimental structure or high-confidence homology/AlphaFold-predicted model of the target is available in 3D, ligand-based strategies are generally preferable. Here, we aim to develop and evaluate new predictive AI models for ligand-based drug discovery. To illustrate our workflow, we propose, as an example, an ensemble classification model for Cdr1 inhibitor prediction. We leverage target-specific experimental data from different sources, various molecular feature types, and multiple state-of-the-art machine learning (ML) algorithms alongside a multi-instance 3D graph neural network (multiple conformations of a single molecule are considered). Bayesian hyperparameter tuning, stacked generalization, and soft voting are involved in our workflow. The final target-specific ensemble model benefits from the classification and screening power of those constituting it. On an external test set structurally dissimilar to the training data, its average precision is 0.755, its F1-score is 0.714, the area under the receiver operating characteristic curve is 0.884, and the balanced accuracy is 0.799. It gives a low false positive rate of 0.1236 on another test set outside the training chemical space, indicating its ability to avoid false positives. The present work highlights the potential of stacking ensemble ML and offers a rigorous general workflow to build ligand-based predictive AI models for other targets.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"65 8","pages":"4027–4042 4027–4042"},"PeriodicalIF":5.6,"publicationDate":"2025-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143878297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Addressing Imbalanced Classification Problems in Drug Discovery and Development Using Random Forest, Support Vector Machine, AutoGluon-Tabular, and H2O AutoML. 利用随机森林、支持向量机、AutoGluon-Tabular和H2O AutoML解决药物发现和开发中的不平衡分类问题。
IF 5.6 2区 化学
Journal of Chemical Information and Modeling Pub Date : 2025-04-15 DOI: 10.1021/acs.jcim.5c00023
Ayush Garg,Narayanan Ramamurthi,Shyam Sundar Das
{"title":"Addressing Imbalanced Classification Problems in Drug Discovery and Development Using Random Forest, Support Vector Machine, AutoGluon-Tabular, and H2O AutoML.","authors":"Ayush Garg,Narayanan Ramamurthi,Shyam Sundar Das","doi":"10.1021/acs.jcim.5c00023","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c00023","url":null,"abstract":"The classification models built on class imbalanced data sets tend to prioritize the accuracy of the majority class, and thus, the minority class generally has a higher misclassification rate. Different techniques are available to address the class imbalance in classification models and can be categorized as data-level, algorithm-level, and hybrid methods. But to the best of our knowledge, an in-depth analysis of the performance of these techniques against the class ratio is not available in the literature. We have addressed these shortcomings in this study and have performed a detailed analysis of the performance of four different techniques to address imbalanced class distribution using machine learning (ML) methods and AutoML tools. To carry out our study, we have selected four such techniques─(a) threshold optimization using (i) GHOST and (ii) the area under the precision-recall curve (AUPR) curve, (b) internal balancing method of AutoML and class-weight of machine learning methods, and (c) data balancing using SMOTETomek─and generated 27 data sets considering nine different class ratios (i.e., the ratio of the positive class and total samples) from three data sets that belong to the drug discovery and development field. We have employed random forest (RF) and support vector machine (SVM) as representatives of ML classifier and AutoGluon-Tabular (version 0.6.1) and H2O AutoML (version 3.40.0.4) as representatives of AutoML tools. The important findings of our studies are as follows: (i) there is no effect of threshold optimization on ranking metrics such as AUC and AUPR, but AUC and AUPR get affected by class-weighting and SMOTTomek; (ii) for ML methods RF and SVM, significant percentage improvement up to 375, 33.33, and 450 over all the data sets can be achieved, respectively, for F1 score, MCC, and balanced accuracy, which are suitable for performance evaluation of imbalanced data sets; (iii) for AutoML libraries AutoGluon-Tabular and H2O AutoML, significant percentage improvement up to 383.33, 37.25, and 533.33 over all the data sets can be achieved, respectively, for F1 score, MCC, and balanced accuracy; (iv) the general pattern of percentage improvement in balanced accuracy is that the percentage improvement increases when the class ratio is systematically decreased from 0.5 to 0.1; in the case of F1 score and MCC, maximum improvement is achieved at the class ratio of 0.3; (v) for both ML and AutoML with balancing, it is observed that any individual class-balancing technique does not outperform all other methods on a significantly higher number of data sets based on F1 score; (vi) the three external balancing techniques combined outperformed the internal balancing methods of the ML and AutoML; (vii) AutoML tools perform as good as the ML models and in some cases perform even better for handling imbalanced classification when applied with imbalance handling techniques. In summary, exploration of multiple data balancing techniques is recom","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"50 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143836483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improved Solubility Predictions in scCO2 Using Thermodynamics-Informed Machine Learning Models 使用热力学信息的机器学习模型改进scCO2溶解度预测
IF 5.6 2区 化学
Journal of Chemical Information and Modeling Pub Date : 2025-04-15 DOI: 10.1021/acs.jcim.5c0043210.1021/acs.jcim.5c00432
Dmitriy M. Makarov*, Nikolai N. Kalikin, Yury A. Budkov*, Pavel Gurikov, Sergey E. Kruchinin, Abolghasem Jouyban and Michael G. Kiselev, 
{"title":"Improved Solubility Predictions in scCO2 Using Thermodynamics-Informed Machine Learning Models","authors":"Dmitriy M. Makarov*,&nbsp;Nikolai N. Kalikin,&nbsp;Yury A. Budkov*,&nbsp;Pavel Gurikov,&nbsp;Sergey E. Kruchinin,&nbsp;Abolghasem Jouyban and Michael G. Kiselev,&nbsp;","doi":"10.1021/acs.jcim.5c0043210.1021/acs.jcim.5c00432","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c00432https://doi.org/10.1021/acs.jcim.5c00432","url":null,"abstract":"<p >Accurate solubility prediction in supercritical carbon dioxide (scCO<sub>2</sub>) is crucial for optimizing experimental design by eliminating unnecessary and costly trials at an early stage, thereby streamlining the workflow. A comprehensive solubility database containing 31,975 records has been compiled, providing a foundation for developing predictive models applicable to a diverse class of chemical compounds, with a particular focus on drug-like substances. In this study, we propose a domain-aware machine learning approach that incorporates thermodynamic properties governing phase transitions to solubility predictions in scCO<sub>2</sub>. Predictive models were developed using the CatBoost algorithm and a graph-based architecture employing directed message passing to identify the most effective approach. Furthermore, auxiliary properties of the solute, including melting point, critical parameters, enthalpy of vaporization, and Gibbs free energy of solvation, were predicted as part of this work. The findings underscore the efficacy of incorporating domain-specific thermodynamic features to enhance the predictive accuracy of scCO<sub>2</sub> solubility modeling. The interpretation and the applicability domain assessment have confirmed the qualitative selection of the employed descriptors, demonstrating their ability to generalize to unique compounds that fall outside the defined domain.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"65 8","pages":"4043–4056 4043–4056"},"PeriodicalIF":5.6,"publicationDate":"2025-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143878271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信