Luis E Castro-Anaya, Eduardo Marese, Jaime A Lozano, Guilherme F Peixer, Jader R Barbosa, Sergio Yesid Gómez González
{"title":"Machine Learning Methodologies Applied to Magnetocaloric Perovskites Discovery.","authors":"Luis E Castro-Anaya, Eduardo Marese, Jaime A Lozano, Guilherme F Peixer, Jader R Barbosa, Sergio Yesid Gómez González","doi":"10.1021/acs.jcim.4c01944","DOIUrl":"10.1021/acs.jcim.4c01944","url":null,"abstract":"<p><p>Traditionally, designing novel materials involves exploring new compositions guided by insights from previous work, relying on a trial-and-error approach, where continuous synthesis and characterization proceed until the properties meet the improvements. This method is inefficient due to the challenges of exploring vast chemical spaces. In this study, a machine-learning-based methodology is developed to assist the design from available data in the literature, allowing us to test in silico more than 1.2 million compositions. Two databases with 1227 inputs were created from published studies. Four machine learning (ML) models were trained over the feature sets using 517 compositional features (generated from 58 atomic properties) to predict magnetocaloric properties of perovskites: Curie temperature (<i>T</i><sub>C</sub>), magnetic entropy change (ME), and relative cooling power (RCP). The best model-feature combinations were used to explore the chemical space of lanthanum, praseodymium, and neodymium manganites, identifying composition trends for different temperature applications, including room temperature refrigeration, where the most suitable combinations of doping elements were highlighted. The study offers valuable guidelines for future research insights on magnetocaloric materials, and the methodology can be transferred to other perovskite related material areas, such as catalysts and solar cell materials.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":"1812-1825"},"PeriodicalIF":5.6,"publicationDate":"2025-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11863371/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143363285","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Computational Modeling of the Enzymatic Achmatowicz Rearrangement by Heme-Dependent Chloroperoxidase: Reaction Mechanism, Enantiopreference, Regioselectivity, and Substrate Specificity.","authors":"Fuqiang Chen, Chenghua Zhang, Shiqing Zhang, Wuyuan Zhang, Hao Su, Xiang Sheng","doi":"10.1021/acs.jcim.4c01658","DOIUrl":"10.1021/acs.jcim.4c01658","url":null,"abstract":"<p><p>The chloroperoxidase from <i>Caldariomyces fumago</i> (<i>Cf</i>CPO) catalyzes the oxidative ring expansion of α-heterofunctionalized furans via the Achmatowicz rearrangement, providing an elegant tool to convert furan rings into complex-prefunctionalized scaffolds. However, the mechanism of this transformation remains unclear. Herein, the <i>Cf</i>CPO-catalyzed reaction of <i>rac-</i>1-(2-furyl)ethanol (<b>1a</b>) is studied by quantum chemical calculations and molecular dynamics simulations. The calculations reveal that the conversion follows the general mechanism of the Achmatowicz reaction. Notably, the binding of <b>1a</b> to the enzyme's active site influences the Compound I (Cpd I) formation, and the (<i>R</i>)-<b>1a</b> enantiomer binding results in a lower barrier compared to (<i>S</i>)-<b>1a</b>, explaining the observed (<i>R</i>)-enantiopreference toward a racemic substrate. Additionally, due to the weaker steric hindrance between the porphyrin ring and substrate, the nucleophilic attack of Cpd I on the furan core of <b>1a</b> is preferred at the less-substituted C4=C5 bond, providing a rationale for the experimentally observed regioselectivity. Finally, the bottleneck residues in the substrate delivery channel and also the active site surroundings are proposed to be responsible for the substrate specificity of <i>Cf</i>CPO. This study lays a theoretical foundation for the rational design of new CPOs that catalyze the Achmatowicz rearrangement with a broader substrate spectrum or specific stereopreference.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":"1928-1939"},"PeriodicalIF":5.6,"publicationDate":"2025-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143062421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"HiRXN: Hierarchical Attention-Based Representation Learning for Chemical Reaction.","authors":"Yahui Cao, Tao Zhang, Xin Zhao, Haotong Li","doi":"10.1021/acs.jcim.4c01787","DOIUrl":"10.1021/acs.jcim.4c01787","url":null,"abstract":"<p><p>In recent years, natural language processing (NLP) techniques, including large language modeling (LLM), have contributed significantly to advancements in organic chemistry research. Chemical reaction representations provide a link between NLP models and chemistry prediction tasks and enable the translation of complex chemical processes into a format that NLP models can understand and learn from. However, previous representation methods fail to adequately consider the hierarchical and structural information inherent in chemical reactions. Here, we propose a tool named HiRXN to learn the comprehensive representation of chemical reactions based on their hierarchical structure. In order to significantly enhance feature engineering for machine learning (ML) models, HiRXN develops an effective tokenization method called RXNTokenizer to capture atomic microenvironment features with multiradius. Then, the hierarchical attention network is used to integrate information from atomic microenvironment-level and molecule-level to accurately understand chemical reactions. The experimental results show that HiRXN is capable of representing chemical reactions and achieves remarkable performance in terms of reaction regression and classification prediction tasks. A web server has been developed to provide a specialized service that accepts Reaction SMILES as input and provides predicted results. The Web site is accessible at http://bdatju.com.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":"1990-2002"},"PeriodicalIF":5.6,"publicationDate":"2025-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143121775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Investigating the Dynamics of the KCNN4 Channel: From the Determination of the Complete K<sup>+</sup> Permeation Pathway Across the Channel to Its Opening by PIP2.","authors":"Stephane Jedele, Benoit Allegrini, Hélène Guizouarn, Catherine Etchebest","doi":"10.1021/acs.jcim.4c01711","DOIUrl":"10.1021/acs.jcim.4c01711","url":null,"abstract":"<p><p>KCNN4 is a calcium (Ca)-activated potassium channel for which Ca<sup>2+</sup> sensitivity is conferred by calmodulin (CaM) that constitutively binds to the channel. Until the main part of the structure bound to CaM has been resolved, <i>in silico</i> studies had used homology models derived from the well characterized transmembrane domain of other K<sup>+</sup> channels, limiting the functional investigation to this particular region. Thus, how the regulatory domains of KCNN4 communicate with each other and where the possible gates are located across the complete structure are still not well understood. Here we present for the first time results obtained from the investigation of full-length models of the channel in different conformational states and molecular contexts using classical all-atom molecular dynamic simulations. The simulations covered two activated states (open and closed) and a preactivated state of the channel embedded in a simple membrane model and a model of red blood cell membrane, where the channel is functionally expressed <i>in vivo</i>. Surprisingly, the intracellular domain was refractory to the entrance of K<sup>+</sup>, whatever the state of the channel was, allowing the K<sup>+</sup> ions to enter and exit the channel only through two newly identified restrained diffusion spots. Inside the channel, the K<sup>+</sup> flux was controlled by the V282 residue closing the pore region when the CaM N-lobes were not bound. This flux was compatible with the passage of fully or partially hydrated K<sup>+</sup>, depending on the opening level. Finally, the presence of phosphatidylinositol-4,5-bisphosphate (PIP2), a well-known K<sup>+</sup>-channel modulator, in a putative binding site of KCNN4 clearly facilitated the opening of the V282 restriction. Thus, in addition to the elucidation of the possible complete K<sup>+</sup> permeation pathway throughout KCNN4, our results confirmed the direct activatory role of PIP2, associated with the channel opening, and provide a first insight into the architecture and the behavior of the complete intracellular region of KCNN4.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":"2116-2128"},"PeriodicalIF":5.6,"publicationDate":"2025-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143381326","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Structural Insight into the Inactive/Active States of 5-HT1AR and Molecular Mechanisms of Electric Fields in Modulating 5-HT1AR.","authors":"Lulu Guan, Bote Qi, Jingwang Tan, Yukang Chen, Yunxiang Sun, Qingwen Zhang, Yu Zou","doi":"10.1021/acs.jcim.4c02278","DOIUrl":"10.1021/acs.jcim.4c02278","url":null,"abstract":"<p><p>Probing the differences between inactive/active states of the serotonin 1A receptor (5-HT1AR) and the dynamic receptor conformations is vital for understanding signaling transduction pathways and diverse physiological responses. Here, we compared the conformational features between the inactive and active states of 5-HT1AR and explored the role of serotonin in the activation process of 5-HT1AR by using molecular dynamics (MD) simulations. The results show that the position of TM6 and the arrangements of key motifs exhibit distinctions in the inactive and active states of 5-HT1AR. The binding of serotonin to 5-HT1AR is mostly driven by hydrophobic, aromatic stacking, anion-π, and H-bonding interactions. We also performed additional MD simulations with electric fields (EFs) of 0.01 and 0.03 V/nm to investigate the effects of EFs on the conformation of the 5-HT1AR-serotonin complex. The conformational change of 5-HT1AR and the inward movement of TM6 are increased with the field strength, indicative of a dependence on the strength of the EF. The EF of 0.03 V/nm affects the binding behaviors of serotonin with 5-HT1AR and further disturbs the activation of 5-HT1AR by serotonin. This study first reveals atomic-level information about the distinct features between inactive and active states of 5-HT1AR and demonstrates the pivotal role of EF in modulating the 5-HT1AR-ligand complex.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":"2066-2079"},"PeriodicalIF":5.6,"publicationDate":"2025-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143381328","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Impact of Varying Velocities and Solvation Boxes on Alchemical Free-Energy Simulations.","authors":"Meiting Wang, Hao Jiang, Ulf Ryde","doi":"10.1021/acs.jcim.4c02236","DOIUrl":"10.1021/acs.jcim.4c02236","url":null,"abstract":"<p><p>Alchemical free-energy perturbation (FEP) is an accurate and thermodynamically stringent way to estimate relative energies for the binding of small ligands to biological macromolecules. It has repeatedly been pointed out that a single simulation normally stays near the starting point in phase space and therefore underestimates the uncertainty of the results. Therefore, it is better to run an ensemble of independent simulations. Traditionally, such an ensemble has been generated by using different starting velocities. We argue that it is better to use also other random choices made during the setup of the simulations, in particular the solvation of the solute. We show here that such solvent-induced independent simulations (SIS) sometimes give a larger standard deviation and slightly different results for the binding of 42 ligands to five different proteins, viz. human N-terminal bromodomain 4, the Leu99Ala mutant of T4 lysozyme, dihydrofolate reductase, blood-clotting factor Xa, and ferritin. SIS does not involve any increase in the time consumption. Therefore, we strongly recommend the use of SIS (in addition to different velocities) to start independent simulations. Other random or uncertain choices in the setup of the simulated systems, e.g., the selection of residues with alternative conformations or positions of added protons, may also be used to enhance the variation in independent simulations.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":"2107-2115"},"PeriodicalIF":5.6,"publicationDate":"2025-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11863368/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143062427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Estimation of Hematocrit Volume Using Blood Glucose Concentration through Extreme Gradient Boosting Regressor Machine Learning Model.","authors":"Kirti Sharma, Pawan K Tiwari, S K Sinha","doi":"10.1021/acs.jcim.4c01423","DOIUrl":"10.1021/acs.jcim.4c01423","url":null,"abstract":"<p><p>Lifestyle diseases such as cardiovascular disorders, diabetes, etc. affect the physiological metabolism and become chronic upon negligence. Diabetes is one of the key factors that is interlinked with a plethora of diseases. Health management can be achieved through balanced diet, physical exercise, and periodic examination of blood glucose level and hematocrit volume. Our study developed a model to estimate the hematocrit volume (red blood cells) from the correlation of the glucose concentration obtained from a glucometer by employing machine learning techniques. This Article explores the prediction of hematocrit volume in whole blood by applying various machine learning (ML) models such as linear regression (LR), support vector regressor (SVR), decision tree (DT), random forest regressor (RFR), artificial neural network (ANN), and extreme gradient boosting regressor model (XGBoost). We used amperometric signals generated from an electrochemical glucose sensor or glucose strip, which produces current values on glucose concentration. We estimated the hematocrit volume via processing of the amperometric signals to enhance diagnostic capabilities with the least error in the field of biomedical signal processing. The ML models were trained on the data set comprising 80% training set and 20% testing set in the Python programming language. The models were evaluated based on the metrics such as R-squared (R<sup>2</sup>), mean squared error (MSE), and root mean squared error (RMSE) values, and their reliability was assessed through the three validation mechanisms, namely, the relative error, K-fold cross-validation, and analysis of confidence interval. We observed that the XGBoost regression results were comparatively better than the LR and ANN results as corroborated through reliability analysis. It was concluded that XGBoost demonstrated 15% relative error between actual and predicted data and 68% accuracy with 6% standard deviation in the prediction obtained via a 5-fold cross-validation technique. The XGBoost model demonstrates comparatively better performance in terms of flexibility in tuning and interpretability options, which make it suitable for the regression task in the predictive biomedical analytics.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":"1736-1746"},"PeriodicalIF":5.6,"publicationDate":"2025-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143187649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Li Peng, Huaping Li, Sisi Yuan, Tao Meng, Yifan Chen, Xiangzheng Fu, Dongsheng Cao
{"title":"metaCDA: A Novel Framework for CircRNA-Driven Drug Discovery Utilizing Adaptive Aggregation and Meta-Knowledge Learning.","authors":"Li Peng, Huaping Li, Sisi Yuan, Tao Meng, Yifan Chen, Xiangzheng Fu, Dongsheng Cao","doi":"10.1021/acs.jcim.4c02193","DOIUrl":"10.1021/acs.jcim.4c02193","url":null,"abstract":"<p><p>In the emerging field of RNA drugs, circular RNA (circRNA) has attracted much attention as a novel multifunctional therapeutic target. Delving deeper into the intricate interactions between circRNA and disease is critical for driving drug discovery efforts centered around circRNAs. Current computational methods face two significant limitations: a lack of aggregate information in heterogeneous graph networks and a lack of higher-order fusion information. To this end, we present a novel approach, metaCDA, which utilizes meta-knowledge and adaptive aggregate learning to improve the accuracy of circRNA and disease association predictions and addresses the limitations of both. We calculate multiple similarity measures between disease and circRNA, construct a heterogeneous graph based on these, and apply meta-networks to extract meta-knowledge from the heterogeneous graph, so that the constructed heterogeneous maps have adaptive contrast enhancement information. Then, we construct a nodal adaptive attention aggregation system, which integrates a multihead attention mechanism and a nodal adaptive attention aggregation mechanism, so as to achieve accurate capture of higher-order fusion information. We conducted extensive experiments, and the results show that metaCDA outperforms existing state-of-the-art models and can effectively predict disease-associated circRNA, opening up new prospects for circRNA-driven drug discovery.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":"2129-2144"},"PeriodicalIF":5.6,"publicationDate":"2025-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143404899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Predicting and Explaining Yields with Machine Learning for Carboxylated Azoles and Beyond.","authors":"Kerrin Janssen, Jonny Proppe","doi":"10.1021/acs.jcim.4c02336","DOIUrl":"10.1021/acs.jcim.4c02336","url":null,"abstract":"<p><p>Carbon dioxide (CO<sub>2</sub>) can be transformed into valuable chemical building blocks, including C2-carboxylated 1,3-azoles, which have potential applications in pharmaceuticals, cosmetics, and pesticides. However, only a small fraction of the millions of available 1,3-azoles are carboxylated at the C2 position, highlighting significant opportunities for further research in the synthesis and application of these compounds. In this study, we utilized a supervised machine learning approach to predict reaction yields for a data set of amide-coupled C2-carboxylated 1,3-azoles. To facilitate molecular design, we integrated an interpretable heat-mapping algorithm named PIXIE (Predictive Insights and Xplainability for Informed chemical space Exploration). PIXIE visualizes the influence of molecular substructures on predicted yields by leveraging fingerprint bit importances, providing synthetic chemists with a powerful tool for the rational design of molecules. While heat mapping is an established technique, its integration with a machine-learning model tailored to the chemical space of C2-carboxylated 1,3-azoles represents a significant advancement. This approach not only enables targeted exploration of this underrepresented chemical space, fostering the discovery of new bioactive compounds, but also demonstrates the potential of combining these methods for broader applications in other chemical domains.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":"1862-1872"},"PeriodicalIF":5.6,"publicationDate":"2025-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11863374/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143363149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ProcessOptimizer, an Open-Source Python Package for Easy Optimization of Real-World Processes Using Bayesian Optimization: Showcase of Features and Example of Use.","authors":"Søren Bertelsen, Sigurd Carlsen, Søren Furbo, Morten Bormann Nielsen, Aksel Obdrup, Rolf Taaning","doi":"10.1021/acs.jcim.4c02240","DOIUrl":"10.1021/acs.jcim.4c02240","url":null,"abstract":"<p><p>ProcessOptimizer is a Python package designed to provide easy access to advanced machine learning techniques, specifically Bayesian optimization using, e.g., Gaussian processes. Aimed at experimentalist scientists and applicable to process and product optimizations in various fields, this package simplifies the optimization process, offering features such as benchmarking, noise addition/removal, multiobjective optimization, batch-mode operation, and comprehensive plotting features. The present publication focuses on ease of use by presenting an optimization of a chemical reaction to produce a specific color, such as leaf green.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":"1702-1707"},"PeriodicalIF":5.6,"publicationDate":"2025-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11863379/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143370108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}