{"title":"A Scalable and Generalizable Method to Minimize Solvent Interference in Identification of Chemical Reaction Networks from Spectroscopic Data.","authors":"Kuldeep Singh,Karthik Srinivasan,Ziting Sun,Jing Liu,Vinay Prasad","doi":"10.1021/acs.jcim.5c01553","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c01553","url":null,"abstract":"Challenges such as varying levels of solvent interference that obscure spectral bands restrict the applicability and direct adoption of spectroscopic techniques for the analysis and characterization of complex reacting systems. In this work, we develop a generic and scalable method to minimize solvent interference on the spectroscopic signatures of reacting mixtures under varying process conditions without prior information about the constituents. The method frames solvent effect minimization as a tensorial factorization problem to segregate the solute and solvent contributions (i.e., latent factors) across each data dimension. We employ two distinct methodologies, named the direct and orthogonal approaches, to distinguish between the solute and the solvent latent factors. Comparative analyses on four case studies with spectroscopic process data show the efficiency of the proposed methods in minimizing and extracting useful information from obscured bands. The extracted solvent-free latent factors can be reconstructed to provide solvent-free spectroscopic data or directly applied to tasks such as mixture characterization, impurity detection, predictive modeling, and data mining. In this work, we apply them to generate plausible reaction networks for various chemical systems. The proposed approaches generalize to any solvent and adapt to the large process data sets typically found in chemical process industries.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"39 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145103507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Auxiliary Discrminator Sequence Generative Adversarial Networks for Few Sample Molecule Generation.","authors":"Haocheng Tang, Jing Long, Beihong Ji, Junmei Wang","doi":"10.1021/acs.jcim.5c01737","DOIUrl":"10.1021/acs.jcim.5c01737","url":null,"abstract":"<p><p>In this work, we introduce auxiliary discriminator sequence generative adversarial networks (ADSeqGAN), a novel approach for molecular generation in small-sample data sets. Traditional generative models often struggle with limited training data, particularly in drug discovery, where molecular data sets for specific therapeutic targets, such as nucleic acid binders and central nervous system (CNS) drugs, are scarce. ADSeqGAN addresses this challenge by integrating an auxiliary random forest classifier as an additional discriminator into the GAN framework, significantly improving molecular generation quality and class specificity. Our method incorporates a pretrained generator and Wasserstein distance to enhance training stability and diversity. We evaluated ADSeqGAN across three representative cases. First, on nucleic acid- and protein-targeting molecules, ADSeqGAN shows superior capability in generating nucleic acid binders compared with baseline models. Second, through oversampling, it markedly improves CNS drug generation, achieving higher yields than traditional de novo models. Third, in cannabinoid receptor type 1 (CB1) ligand design, ADSeqGAN generates novel druglike molecules with 32.8% predicted actives surpassing hit rates of CB1-focused and general-purpose libraries when assessed by a target-specific LRIP-SF scoring function. Overall, ADSeqGAN offers a versatile framework for molecular design in data-scarce scenarios with demonstrated applications in nucleic acid binders, CNS drugs, and CB1 ligands.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":""},"PeriodicalIF":5.3,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145123830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Physics-Embedded Machine Learning Model for Phase Equilibrium Prediction in Multicomponent Systems.","authors":"Yue Yang,Shiang-Tai Lin","doi":"10.1021/acs.jcim.5c01804","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c01804","url":null,"abstract":"We present TeNNet-SAC (Thermodynamics-embedded Neural Network for Segment Activity Coefficient) model, a novel machine learning framework for predicting activity coefficients in liquid mixtures using only the SMILES representations of the constituent molecules. Inspired by the quantum chemistry-based COSMO-SAC model, TeNNet-SAC evaluates activity coefficients by summing contributions from molecular surface segments. The model comprises three core components: (1) a σ-profile predictor, which generates molecular fingerprints (i.e., surface segment charge histogram or σ-profile) directly from SMILES; (2) a geometry predictor, which estimates molecular volume and surface area from SMILES; and (3) a Γ predictor, which computes the activity coefficients of surface segments in solution. The σ-profile and geometry predictors are trained on 39,745 quantum solvation calculations. The Γ predictor is initially pretrained on one million synthetic data points to capture physically consistent behavior and is subsequently fine-tuned end-to-end using experimental activity coefficient data to improve predictive accuracy. The base TeNNet-SAC model achieves accuracy comparable to COSMO-SAC, while the fine-tuned version consistently outperforms COSMO-SAC across benchmark systems. By treating segment activity coefficients as intermediate variables, TeNNet-SAC naturally generalizes to multicomponent mixtures and satisfies thermodynamic consistency, offering a robust and scalable solution for activity coefficient prediction.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"17 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145103339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Machine Learning Tackles the Challenge of Powder X-ray Diffraction Indexing for All Crystal Systems.","authors":"Ke Shu,Dong-Yun Gui,Wei-Xin Yan,Chun-Hai Wang","doi":"10.1021/acs.jcim.5c01506","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c01506","url":null,"abstract":"The indexing of powder X-ray diffraction (PXRD) in ab initio unknown structure determinations is a critical yet challenging step in crystallography, particularly for low-symmetry systems (e.g., monoclinic, triclinic) and/or large unit cell systems (V > 1000 Å3). In this work, a machine learning-based indexing method is presented, which achieves high-precision, end-to-end prediction of crystal symmetry and unit cell parameters from powder diffraction peaks for all crystal systems. The trained models (denoted as AIdex) achieve a top-5 accuracy of ∼97% in extinction group (symmetry class) identification, and a mean absolute percentage error (MAPE) <5% for indexing, demonstrating significant improvements in both accuracy and time consumed compared to traditional algorithms (TREOR/ITO/DICVOL). AIdex also shows high capacity for experimental applications, maintaining a success rate of ∼90% even under extreme conditions involving zero-shift error (±0.6°) and uncertainty noise (±0.15°). Applied to practical PXRD data, AIdex gives predicted unit cell parameters close to the experimentally refined ones (MAPE < 5%), serving as ideal initial inputs for further Pawley refinements. This leads to a new paradigm for rapid indexing in ab initio unknown structure determination, facilitating the advancement of crystallographic analysis toward automation and intelligent methodologies.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"1 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145103337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"gmx_ffconv: A Fast, User-Friendly Semi-Automated All-Atom Force Field Converter for GROMACS.","authors":"Jasmine E Aaltonen","doi":"10.1021/acs.jcim.5c02200","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c02200","url":null,"abstract":"This application note presents gmx_ffconv, a command-line tool developed to facilitate the conversion of systems between all-atom force fields within GROMACS. As different force fields use their own naming conventions and atom ordering, force field conversion within GROMACS is usually a time-consuming, error-prone process. gmx_ffconv resolves atom ordering and naming mismatches between different force fields by reordering the coordinate file via molecular graph matching. This enables the use of identical starting coordinates across force fields, facilitating comparative simulations without requiring manual reordering or scripting. The tool has been validated on a broad range of systems, from small, nonstandard ligands to large, solvated heterogeneous systems with more than two million atoms. gmx_ffconv is available on GitHub: github.com/Jassu1998/gmx_ffconv.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"85 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145093551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Cross-Modal Interaction-Aware Progressive Fusion Network for Drug-Target Interaction Prediction.","authors":"Zhichong Cao,Jing Xie,Junlin Xu,Bo Li","doi":"10.1021/acs.jcim.5c01429","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c01429","url":null,"abstract":"Drug-target interaction (DTI) prediction plays a pivotal role in drug discovery. In recent years, deep learning-based models have been advanced rapidly, accelerating the identification of potential DTIs. However, how to effectively capture the cross-modal information from bidirectional DTIs and how to further fuse them remain challenges for existing methods. To address these issues, we propose a deep learning fusion framework termed cross-modal interaction-aware progressive fusion network (CIPFN) for DTI prediction. This framework introduces a bidirectional interaction-aware module to precisely align fine-grained interactions between drugs and proteins. In addition, a progressive fusion network is also developed, including both gated and convolutional fusion blocks, to efficiently extract critical information within drug-target relationships. Experimental results across five benchmark data sets demonstrate that the proposed CIPFN achieves significant improvements over some state-of-the-art methods on the metrics of AUROC, AUPRC, F1, sensitivity, and accuracy.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"29 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145083505","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"PPAP: A Protein-protein Affinity Predictor Incorporating Interfacial Contact-Aware Attention.","authors":"Jie Qian,Lin Yang,Zhen Duan,Renxiao Wang,Yifei Qi","doi":"10.1021/acs.jcim.5c01390","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c01390","url":null,"abstract":"Protein-protein interactions (PPIs) play fundamental roles in biological processes and therapeutic development. Accurately predicting PPI binding affinity is critical for understanding interaction mechanisms and guiding protein engineering. Recent advances in structure prediction like AlphaFold have enabled accurate modeling of protein-protein complexes, creating new opportunities for structure-based affinity prediction. However, existing methods predominantly rely on sequence information and fail to fully exploit structural insights at interaction interfaces. To address this gap, we propose PPAP, a novel deep learning framework that integrates structural features with sequence representations through an interfacial contact-aware attention mechanism. Our model demonstrated superior prediction performance across all evaluated data sets, outperforming strong sequence-based large language models on the internal test (R = 0.540, MAE = 1.546). On the external test set, our model achieved a higher Pearson correlation coefficient (R = 0.63) than all benchmarked models. In protein binder design, we further demonstrate that incorporating our model's prediction can enhance enrichment by up to 10-fold in comparison to the metrics based on AlphaFold-Multimer prediction. Given its robust performance, PPAP holds promise as a valuable tool not only for protein design but also for a wide range of protein interaction-related applications.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"37 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145083504","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Maria Laura De Sciscio,Rosa De Troia,Joann Kervadec,Fabio Centola,Simona Saporiti,Muriel Priault,Marco D'Abramo
{"title":"Mechanism-Driven Features Enable Asn Deamidation Reactivity Prediction via Machine Learning Methods.","authors":"Maria Laura De Sciscio,Rosa De Troia,Joann Kervadec,Fabio Centola,Simona Saporiti,Muriel Priault,Marco D'Abramo","doi":"10.1021/acs.jcim.5c01386","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c01386","url":null,"abstract":"The spontaneous deamidation of Asparagine (Asn) residues is a common post-translational modification of proteins that can occur on disparate time scales, ranging from hours to thousands of years. This variability in the reaction rate reflects the influence of structural and environmental factors on the multistep mechanism of the deamidation reaction. Understanding the fine connection between reactivity and these modulating factors is essential to advance our knowledge of the deamidation kinetics in proteins and improve the prediction of deamidation-prone residues. In this work, we assessed the step-specific structural-dynamics parameters underlying the chemical basis of the first two reaction stages (the deprotonation and ring-closure steps) and developed novel descriptors derived from molecular dynamics (MD) simulations, which encompass solvation, hydrogen bonds, conformational free energy, and an environment electrostatic effect. These descriptors were evaluated across 63 Asn residues from six distinct proteins and used as input features for three machine learning models, Random Forest, Naive Bayes, and Logistic Regression, to classify Asn residue reactivity. Among these, the Random Forest classifier achieved the best predictive metrics, underscoring the significance of mechanism-tailored features in discriminating Asn reactivity and unveiling the key physicochemical factors that govern deamidation rates in proteins.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"89 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145089800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Revisiting Protein-Protein Docking: A Systematic Evaluation Framework.","authors":"Linlong Jiang,Ke Zhang,Kai Zhu,Ying Wang,Yu Kang,Tingjun Hou","doi":"10.1021/acs.jcim.5c01399","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c01399","url":null,"abstract":"Protein-protein interactions play pivotal roles in a wide range of biological processes. Determining the atomic-level structures of protein-protein complexes is indispensable for elucidating macromolecular interaction mechanisms and advancing structure-based drug design. Protein-protein docking, as one of the leading computational approaches for predicting complex structures, has seen considerable progress but requires rigorous evaluation in practical applications. In this study, we proposed a comprehensive benchmarking framework to evaluate 11 docking methods spanning traditional (HDOCK, PatchDock, PIPER, ZDOCK) and deep learning (DL)-based (EquiDock, ElliDock, EBMDock, GeoDock, DiffDock-PP, AlphaFold-Multimer, AlphaFold3) approaches. Our framework incorporates the classical DockingBenchmark 5.5 data set for evaluating flexible docking, introduces a newly curated data set (AACBench) for antibody-antigen complex docking, and establishes the PPCBench data set to examine the out-of-distribution (OOD) generalization capabilities of DL-based methods. In docking against apo structures, AlphaFold3 achieves a superior top-5 success rate of 77.98%, whereas the traditional approach HDOCK reaches merely 12.84%, despite its highest top-5 success rate of 85.24% when docking against holo structures. For antibody-antigen docking, AlphaFold3 remains the most accurate method (top-5 success rate: 31.78%) and substantially outperforms AlphaFold-Multimer in modeling the CDR-H3 loop. In OOD generalization tests, all DL-based models exhibit markedly reduced performance on the PPCBench data set. Overall, our work establishes a unified benchmarking framework that enables systematic evaluation of docking methods across diverse tasks and provides critical insights into the strengths and limitations of current docking strategies, thereby informing future developments in protein-protein docking research.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"16 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145083559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Structural Basis for Negative Regulation of ABA Signaling by ROP11 GTPase.","authors":"Chuankai Zhao, Hassan Nadeem, Diwakar Shukla","doi":"10.1021/acs.jcim.5c02002","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c02002","url":null,"abstract":"<p><p>Abscisic acid (ABA) is an essential plant hormone that is responsible for plant development and stress responses. Recent structural and biochemical studies have identified the key components involved in the ABA signaling cascade, including PYR/PYL/RCAR receptors, protein phosphatases PP2C, and protein kinases SnRK2. The plant-specific Rho-like (ROPs) small GTPases are negative regulators of ABA signal transduction by interacting with PP2C, which can shut off \"leaky\" ABA signal transduction caused by the constitutive activity of monomeric PYR/PYL/RCAR receptors. However, the structural basis for the negative regulation of ABA signaling by ROP GTPases remains elusive. In this study, we have utilized large-scale coarse-grained (10.05 ms) and all-atom molecular dynamics simulations and standard protein-protein binding free energy calculations to predict the complex structure of AtROP11 and phosphatase AtABI1. In addition, we have predicted the detailed complex association pathway and identified the critical residue pairs in AtROP11 and AtABI1 for complex stability. Overall, this study established a powerful framework for using large-scale molecular simulations to predict unknown protein complex structures and suggested the molecular mechanism of the negative regulation of ABA signal transduction by small GTPases.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":""},"PeriodicalIF":5.3,"publicationDate":"2025-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145084559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}