Yulong Wu,Jin Xie,Jing Nie,Bonan Ding,Yuansong Zeng,Jiale Cao
{"title":"BalancedDiff: Balanced Diffusion Network for High-Quality Molecule Generation.","authors":"Yulong Wu,Jin Xie,Jing Nie,Bonan Ding,Yuansong Zeng,Jiale Cao","doi":"10.1021/acs.jcim.5c00837","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c00837","url":null,"abstract":"Traditional drug discovery and development are time-consuming and expensive. Deep learning-based molecule generation techniques can reduce costs and improve efficiency, helping to generate high-quality molecules with desirable properties. However, existing deep learning-based methods focus on designing complex network structures to extract key features, which ignore the impact of sample bias and rarely take biochemical principles into account. To solve the above problems, a Balance Loss is proposed to balance sample bias. Second, we designed a KAN-based Balanced Feature Filtering (KBFF) module that balances molecular feature information with spatial location data, effectively filtering out unrelated groups. This approach ensures that the model considers both the chemical properties of functional groups and their spatial arrangements, minimizing noise while preserving critical biochemical relationships. By achieving this balance, the module improves the generated molecular quality. Besides, while diffusion models generate numerous molecules, their effectiveness and reliability remain uncertain, limiting their practical utility. To overcome this limitation, we introduce a QikProp module that predicts ADME properties, filtering out molecules with poor drug-like characteristics or potential safety risks, thereby enhancing the quality and applicability of generated molecules. Experiments on the CrossDocked2020 data set demonstrate the superiority of our method.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"44 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144311601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Luis A García-González,Yovani Marrero-Ponce,César R García-Jacas,Sergio A Aguila Puentes
{"title":"Optimal Descriptor Subset Search via Chemical Information and Target Activity-Guided Algorithm for Antimicrobial Peptide Prediction.","authors":"Luis A García-González,Yovani Marrero-Ponce,César R García-Jacas,Sergio A Aguila Puentes","doi":"10.1021/acs.jcim.5c00600","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c00600","url":null,"abstract":"Antimicrobial peptides (AMPs) have emerged as a promising alternative to conventional drugs due to their potential applications in combating multidrug-resistant pathogens. Various computational approaches have been developed for AMP prediction, ranging from shallow learning methods to advanced deep learning techniques. Additionally, the performance of shallow learning models based on self-learning features derived from protein language models has recently been studied. However, the performance of AMP models based on shallow learning strongly depends on the quality of descriptors derived via manual feature engineering, which may miss crucial information by assuming that the initial descriptor set fully captures relevant information. The AExOp-DCS algorithm was introduced as an automatic feature domain optimization method that identifies the \"optimal\" descriptor set driven by the chemical structure and biological activity of the compounds under study. QSAR models built on AExOp-DCS optimized descriptors outperform those using nonoptimized sets. In this study, we explore the use of AExOp-DCS to identify optimal descriptor subsets for AMP modeling. Experimental results show that the descriptors returned by AExOp-DCS contain information comparable to those used in top-performing models while exhibiting higher discriminative capacity. The generated models based on the descriptors returned by AExOp-DCS achieved performance metric values comparable to state-of-the-art approaches while utilizing fewer descriptors, suggesting a more efficient modeling process. By reducing dimensionality without sacrificing accuracy, this approach contributes to the development of more efficient computational pipelines for AMP discovery. Finally, a Java software called AExOp-DCS-SEQ is freely available, enabling researchers to leverage its capabilities for peptide descriptor search and AMP classification tasks.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"38 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144311602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Transfer-Learning Deep Raman Models Using Semiempirical Quantum Chemistry.","authors":"Jawad Kamran,Julian Hniopek,Thomas Bocklitz","doi":"10.1021/acs.jcim.5c00513","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c00513","url":null,"abstract":"Biophotonic technologies such as Raman spectroscopy are powerful tools for obtaining highly specific molecular information. Due to its minimal sample preparation requirements, Raman spectroscopy is widely used across diverse scientific disciplines, often in combination with chemometrics, machine learning (ML), and deep learning (DL). However, Raman spectroscopy lacks large databases of independent Raman spectra for model training, leading to overfitting, overestimation, and limited model generalizability. We address this problem by generating simulated vibrational spectra using semiempirical quantum chemistry methods, enabling the efficient pretraining of deep learning models on large synthetic data sets. These pretrained models are then fine-tuned on a smaller experimental Raman data set of bacterial spectra. Transfer learning significantly reduces the computational cost while maintaining performance comparable to models trained from scratch in this real biophotonic application. The results validate the utility of synthetic data for pretraining deep Raman models and offer a scalable framework for spectral analysis in resource-limited settings.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"12 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144311600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improved Prediction of Drug-Protein Interactions through Physics-Based Few-Shot Learning.","authors":"Keqiong Zhang,Zhiran Fan,Qilong Wu,Jianfeng Liu,Sheng-You Huang","doi":"10.1021/acs.jcim.5c00427","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c00427","url":null,"abstract":"Accurate prediction of drug-protein interactions is crucial for drug discovery. Due to the bottleneck of traditional scoring functions, many machine learning scoring functions (MLSFs) have been proposed for structure-based drug screening. However, existing MLSFs face two challenges: small data limitations and poor interpretability. To address these challenges, we have proposed a physics-based small data machine learning framework for interpretable and generalizable prediction of drug-protein interactions on the target with scarce positive data through a strategy of three training phases with three (score, weight, and ranking) loss functions, named DrugBaiter. DrugBaiter has been extensively evaluated on the 102 targets of DUD-E and 81 targets of DEKOIS 2.0 for drug screening, and compared with 14 other MLSFs. It is shown that our DrugBaiter model can significantly improve the drug screening performance even if few actives are known for a target. In addition, DrugBaiter is interpretable in describing the interactions at the atomic level. The power of DrugBaiter is also confirmed by a drug screening application on the SARS-Cov-2 main protease target. It is anticipated that DrugBaiter will serve as a general machine learning scoring model for screening novel drugs on new targets with scarce known actives. DrugBaiter is freely available at http://huanglab.phys.hust.edu.cn/DrugBaiter.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"93 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144311606","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SparcleQC: Automated Input File Creation for QM/MM Studies of Protein:Ligand Complexes.","authors":"Caroline S Glick,Isabel P Berry,C David Sherrill","doi":"10.1021/acs.jcim.5c00617","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c00617","url":null,"abstract":"SparcleQC is a Python package that, given a protein:ligand complex in the Protein Data Bank (PDB) file format, can create quantum mechanics/molecular mechanics (QM/MM)-like input files for the electronic structure theory packages Psi4, Q-Chem, and NWChem. The resulting input files include quantum mechanical representations of the ligand and a small section of the protein, surrounded by point charges that represent the rest of the protein. Creation of these QM/MM input files includes cutting and capping the QM subregion, obtaining point charges for the protein, and adjusting charges at the QM/MM boundary; and each of these tasks are automated by the software. In this article, we describe the details of SparcleQC's procedure, show examples of the Python API, and explain additional features that are helpful in protein:ligand interaction studies. Finally, we show that SparcleQC enables automated preparation of input files for QM/MM calculations, which can return can return accurate interaction energies in minutes, while a fully quantum mechanical computation on the protein:ligand complex could take days, if it is even possible.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"21 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144311603","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Miguel Nouman, Richard B Canty, Brent A Koscher, Matthew A McDonald, Klavs F Jensen
{"title":"General Chemically Intuitive Atom- and Bond-Level DFT Descriptors for Machine Learning Approaches to Reaction Condition Prediction.","authors":"Miguel Nouman, Richard B Canty, Brent A Koscher, Matthew A McDonald, Klavs F Jensen","doi":"10.1021/acs.jcim.4c02255","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c02255","url":null,"abstract":"<p><p>We demonstrate the usefulness of general atom- and bond-level density functional theory (DFT) descriptors to enhance the performance of neural networks for general reaction condition prediction. We treat condition prediction as a multiclass classification task and report the performance of neural networks and random forests as evaluated by 5-fold cross-validation on a 69,935 reaction data set with 296 distinct single-component reaction condition classes and varying input embedding compositions. We show that by combining structural and general DFT descriptors, models with up to 71% fewer trainable parameter than their purely structural counterparts can provide comparable or superior weighted precision, top-1 and top-3 accuracies. Moreover, we report improvements of up to 5, 10, and 11% in weighted precision, top-1 accuracy and <i>F</i><sub>1</sub> score, respectively, for neural networks trained on hybrid representations which combine general DFT and structural descriptors, when compared to structural models with equivalent architectures and input sizes. Remarkably, the best performing neural network trained on hybrid embeddings outperforms the best purely structural model investigated despite the latter benefiting from of an embedding strategy with 267 times more data points than the one used for generating and embedding hybrid descriptors, with both strategies being unsupervised learning algorithms that share considerable conceptual and architectural similarities.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144309262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lexin Chen, Daniel R Roe, Ramón Alain Miranda-Quintana
{"title":"CADENCE: Clustering Algorithm─Density-Based Exploration and Novelty Clustering with Efficiency.","authors":"Lexin Chen, Daniel R Roe, Ramón Alain Miranda-Quintana","doi":"10.1021/acs.jcim.5c00392","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c00392","url":null,"abstract":"<p><p>Unsupervised learning techniques play a pivotal role in unraveling protein folding landscapes, constructing Markov State Models, expediting replica exchange simulations, and discerning drug binding patterns, among other applications. A fundamental challenge in current clustering methods lies in how similarities among objects are accessed. Traditional similarity operations are typically only defined over pairs of objects, and this limitation is at the core of many performance issues. The crux of the problem in this field is that efficient algorithms like <i>k</i>-means struggle to distinguish between metastable states effectively. However, more robust methods like density-based clustering demand substantial computational resources. Extended similarity techniques have been proven to swiftly pinpoint high and low-density regions within the data in linear O(<i>N</i>) time. This offers a highly convenient means to explore complex conformational landscapes, enabling focused exploration of rare events or identification of the most representative conformations, such as the medoid of the data set. In this contribution, we aim to bridge this gap by introducing a novel density clustering algorithm to the Molecular Dynamics Analysis with <i>N</i>-ary Clustering Ensembles (MDANCE) software package based on <i>n</i>-ary similarity framework.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144309261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shanchen Pang,Zheqi Song,Yunyin Li,Wenhao Wu,Yu Zhang,Yuanyuan Zhang,Shudong Wang
{"title":"Higher-Order Weighted Perturbation-Based Multilevel Information Fusion Model for Predicting CircRNA-Disease Associations.","authors":"Shanchen Pang,Zheqi Song,Yunyin Li,Wenhao Wu,Yu Zhang,Yuanyuan Zhang,Shudong Wang","doi":"10.1021/acs.jcim.5c00946","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c00946","url":null,"abstract":"CircRNAs are closely associated with the initiation and progression of various diseases, and investigating their potential associations is crucial for understanding disease mechanisms. Although existing computational methods have made notable progress, they often overlook the latent information contained in higher-order associations between circRNAs and diseases. Moreover, these methods often rely on a single linear or nonlinear feature learning approach, which fails to comprehensively capture multilevel features. To address this, a high-order weighted perturbation-based multilevel information fusion model (HWP-MIFM) is proposed. The model dynamically adjusts the weights of different orders using a higher-order weighted perturbation method to extract higher-order association information. A dual-stage matrix factorization module is applied to construct a multilayer structure and extract linear features. Additionally, a dual-path feature learning module is utilized to dig complex nonlinear relationships within similarity networks, ensuring the comprehensive capture of multilevel features. Experimental results demonstrate that, in 5-fold cross-validation on four data sets, HWP-MIFM outperforms seven state-of-the-art prediction methods in terms of overall performance. Ablation studies and case analyses further confirm the accuracy and practical value of the model.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"4 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144311604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Adriana Coricello, Anna Lisa Chiaravalle, Maria Musgaard, Benjamin Gerald Tehan, Gian Marco Elisi, Giovanni Bottegoni
{"title":"Adiabatic-Bias Molecular Dynamics Simulations Reveal the Impact of Mutations on Muscarinic Antagonist Unbinding Kinetics.","authors":"Adriana Coricello, Anna Lisa Chiaravalle, Maria Musgaard, Benjamin Gerald Tehan, Gian Marco Elisi, Giovanni Bottegoni","doi":"10.1021/acs.jcim.5c00601","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c00601","url":null,"abstract":"<p><p>Ligand-target dissociation rates (<i>k</i><sub>off</sub>) strongly correlate with efficacy and safety profiles, as well as with the therapeutic effect of drugs. As a prototypical example, muscarinic receptor antagonists used as bronchodilators show similar affinity profiles toward the muscarinic M3 receptors (M3R) and M2 receptors (M2R), whereas their kinetic selectivity toward M3R avoids the adverse effects that a prolonged inhibition of M2R would induce at the cardiac level. Previous studies on the dissociation kinetics of human M3R showed that the residence time and binding affinity of muscarinic antagonists are deeply affected by the presence of specific mutations. The aim of our work was to reproduce the rankings of these experimental kinetic rates through an approach based on the application of adiabatic-bias molecular dynamics (ABMD) simulations using Path Collective Variables (PCVs), PCV-ABMD. Employing this methodology, we simulated the translocation of tiotropium, a long-acting bronchodilator targeting M3R, from the orthosteric site to the extracellular vestibule, without considering the whole unbinding process. The estimated times necessary for translocation displayed a strong correlation with the experimental p<i>k</i><sub>off</sub> values. Moreover, a thorough analysis of protein-ligand contacts provided deeper insights into the mechanism of unbinding of muscarinic antagonists. The newly described PCV-ABMD protocol captured relevant metastable states and offered a reliable approach for the prediction of kinetic selectivity in sets of mutants.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144309260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Predicting Biological Activity from Biosynthetic Gene Clusters Using Neural Networks.","authors":"Hemant Goyat, Dalwinder Singh, Sunaina Paliyal, Shrikant Mantri","doi":"10.1021/acs.jcim.5c00465","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c00465","url":null,"abstract":"<p><p>Microorganisms such as bacteria and fungi have been used for natural products that translate to drugs. However, assessing the bioactivity of extract from culture to identify novel natural molecules remains a strenuous process due to the cumbersome order of production, purification, and assaying. Thus, extensive genome mining of microbiomes is underway to identify biosynthetic gene clusters or BGCs that can be profiled as particular natural products, and computational methods have been developed to address this problem using machine learning. However, existing tools are ineffective due to a small training data set, dependence on old genome mining tools, lack of relevant genomic descriptors, and prevalent class imbalance. This work presents a new tool, NPBdetect, that can detect multiple bioactivities and has been designed through rigorous experiments. First, we composed a larger training set using the MIBiG database and a test set through literature mining to build and assess the model, respectively. Second, the latest antiSMASH genome mining tool was used to obtain BGCs and introduced new sequence-based descriptors. Third, neural networks are used to build the model by dealing with class imbalance issues through the class weighting technique. Finally, we compared the NPBdetect tool with an existing tool to show its efficacy and real-world utility in detecting several bioactivities with high confidence.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144309263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}