Guishen Wang,Honghan Chen,Handan Wang,Yuyouqiang Fu,Caiye Shi,Chen Cao,Xiaowen Hu
{"title":"Heterogeneous Graph Contrastive Learning with Graph Diffusion for Drug Repositioning.","authors":"Guishen Wang,Honghan Chen,Handan Wang,Yuyouqiang Fu,Caiye Shi,Chen Cao,Xiaowen Hu","doi":"10.1021/acs.jcim.5c00435","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c00435","url":null,"abstract":"Drug repositioning, which identifies novel therapeutic applications for existing drugs, offers a cost-effective alternative to traditional drug development. However, effectively capturing the complex relationships between drugs and diseases remains challenging. We present HGCL-DR, a novel heterogeneous graph contrastive learning framework for drug repositioning that effectively integrates global and local feature representations through three key components. First, we introduce an improved heterogeneous graph contrastive learning approach to model drug-disease relationships. Second, for local feature extraction, we employ a bidirectional graph convolutional network with a subgraph generation strategy in the bipartite drug-disease association graph, while utilizing a graph diffusion process to capture long-range dependencies in drug-drug and disease-disease relation graphs. Third, for global feature extraction, we leverage contrastive learning in the heterogeneous graph to enhance embedding consistency across different feature spaces. Extensive experiments on four benchmark data sets using 10-fold cross-validation demonstrate that HGCL-DR consistently outperforms state-of-the-art baselines in both AUPR, AUROC, and F1-score metrics. Ablation studies confirm the significance of each proposed component, while case studies on Alzheimer's disease and breast neoplasms validate HGCL-DR's practical utility in identifying novel drug candidates. These results establish HGCL-DR as an effective approach for computational drug repositioning.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"14 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144065607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Entabolons: How Metabolites Modify the Biochemical Function of Proteins and Cause the Correlated Behavior of Proteins in Pathways.","authors":"Jeffrey Skolnick,Bharath Srinivasan,Samuel Skolnick,Brice Edelman,Hongyi Zhou","doi":"10.1021/acs.jcim.5c00462","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c00462","url":null,"abstract":"Although there are over 100,000 distinct human metabolites, their biological significance is often not fully appreciated. Metabolites can reshape the protein pockets to which they bind by COLIG formation, thereby influencing enzyme kinetics and altering the monomer-multimer equilibrium in protein complexes. Binding a common metabolite to a set of protein monomers or multimers results in metabolic entanglements that couple the conformational states and functions of nonhomologous, nonphysically interacting proteins that bind the same metabolite. These shared metabolites might provide the collective behavior responsible for protein pathway formation. Proteins whose binding and functional behavior is modified by a set of metabolites are termed an \"entabolon\"─a portmanteau of metabolic entanglement and metabolon. 55%-60% (22%-24%) of pairs of nonenzymatic proteins that likely bind the same metabolite have a p-value that they are in the same pathway, which is <0.05 (0.0005). Interestingly, the most populated pairs of proteins common to multiple pathways bind ancient metabolites. Similarly, we suggest how metabolites can possibly activate, terminate, or preclude transcription and other nucleic acid functions and may facilitate or inhibit the binding of nucleic acids to proteins, thereby influencing transcription and translation processes. Consequently, metabolites likely play a critical role in the organization and function of biological systems.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"41 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144066851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ruichen Liu,Huiying Wang,Tianren Zhang,Guozhu Liu,Li Wang,Xiangwen Zhang,Guozhu Li
{"title":"Property-Oriented Reverse Design of Hydrocarbon Fuels Based on c-infoGAN.","authors":"Ruichen Liu,Huiying Wang,Tianren Zhang,Guozhu Liu,Li Wang,Xiangwen Zhang,Guozhu Li","doi":"10.1021/acs.jcim.5c00676","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c00676","url":null,"abstract":"Fuel design is usually \"forward\": candidate molecular structures are designed first, and then their properties are predicted for screening. Owing to the large latent space of organic molecules (1060 order), reverse design by giving target fuel properties is urgently needed. However, it is hardly realized due to the unknown complex rule of the structure-property relationship. In this work, reverse design of hydrocarbon fuels is realized based on the conditional generative adversarial network of hydrocarbon molecules. Two deep generative models, c-GAN and c-infoGAN, are established and trained for generating new candidate fuel molecules when target fuel properties are input. c-infoGAN exhibited superior generation ability in terms of the validity, uniqueness, and novelty of the as-generated molecules. JP-10, a classical hydrocarbon fuel, was rediscovered by c-infoGAN. The latent space of fuels constructed by c-infoGAN is ordered, as proved by linear interpolation and linear algebra in this high-dimensional space. Given the target of high density, low freezing point, high heating value, and large specific impulse, 27 new fuel molecules with novel structures, high diversity, and expecting properties were designed. One of the as-designed fuels was experimentally synthesized and tested, which verifies the robust design ability of c-infoGAN. This work opens new avenues for the design of new hydrocarbon fuels to meet the strict requirements of next-generation engines.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"124 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144065887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ABP-Xplorer: A Machine Learning Approach for Prediction of Antibacterial Peptides Targeting Mycobacterium abscessus-tRNA-Methyltransferase (TrmD).","authors":"Munawar Abbas,Kashif Iqbal Sahibzada,Shumaila Shahid,Numan Yousaf,Yuansen Hu,Dong-Qing Wei","doi":"10.1021/acs.jcim.5c00663","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c00663","url":null,"abstract":"Mycobacterium abscessus (MAB) infections pose a significant treatment challenge due to their intrinsic resistance to antibiotics, requiring prolonged multidrug regimens with limited success and frequent relapses. tRNA (m1G37) methyltransferase (TrmD), an enzyme essential for maintaining the reading frame during protein synthesis in MAB and other mycobacteria, is a potential therapeutic target for identifying new inhibitors. This study introduces ABP-Xplorer, a machine learning-based (ML) model designed to predict the antibacterial potential of peptides targeting MAB-TrmD ribosomal sites. A systematic evaluation of 26 machine learning models identified the Random Forest (RF) classifier as the most effective, achieving 96% accuracy. To address data set imbalance and enhance predictive reliability, the Synthetic Minority Oversampling Technique (SMOTE) was applied, improving model generalization and reducing bias. After that, an ABP-Xplorer streamlit was developed to predict positive and negative antibacterial peptides (ABP), enabling easy sequence input and classification based on predictive scoring. For validation, 12 positive peptides with high predictive scores were selected for molecular docking by HADDOCK. Docking analysis of selected peptides confirmed strong binding to TrmD, with P1, P7, P8, and P9 as top candidates. Notably, P1 exhibited the best interaction with a HADDOCK score of -102.2, followed by P7 (-93.6) and P8 (-91.4), indicating their potential for further development as TrmD inhibitors.Moreover, Ramachandran plot analysis validated the structural reliability. Future research should focus on the experimental validation of these peptides and optimizing their stability and bioavailability for therapeutic applications.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"6 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144066850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multiscale Computational Protocols for Accurate Residue Interactions at the Flexible Insulin-Receptor Interface.","authors":"Yevgen P Yurenko,Anja Muždalo,Michaela Černeková,Adam Pecina,Jan Řezáč,Jindřich Fanfrlík,Lenka Žáková,Jiří Jiráček,Martin Lepšík","doi":"10.1021/acs.jcim.5c00772","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c00772","url":null,"abstract":"The quantitative characterization of residue contributions to protein-protein binding across extensive flexible interfaces poses a significant challenge for biophysical computations. It is attributable to the inherent imperfections in the experimental structures themselves, as well as to the lack of reliable computational tools for the evaluation of all types of noncovalent interactions. This study leverages recent advancements in semiempirical quantum-mechanical and implicit solvent approaches embodied in the PM6-D3H4S/COSMO2 method for the development of a hierarchical computational protocols encompassing molecular dynamics, fragmentation, and virtual glycine scan techniques for the investigation of flexible protein-protein interactions. As a model, the binding of insulin to its receptor is selected, a complex and dynamic process that has been extensively studied experimentally. The interaction energies calculated at the PM6-D3H4S/COSMO2 level in ten molecular dynamics snapshots did not correlate with molecular mechanics/generalized Born interaction energies because only the former method is able to describe nonadditive effects. This became evident by the examination of the energetics in small-model dimers featuring all the present types of noncovalent interactions with respect to DFT-D3 calculations. The virtual glycine scan has identified 15 hotspot residues on insulin and 15 on the insulin receptor, and their contributions have been quantified using PM6-D3H4S/COSMO2. The accuracy and credibility of the approach are further supported by the fact that all the insulin hotspots have previously been detected by biochemical and structural methods. The modular nature of the protocol has enabled the formulation of several variants, each tailored to specific accuracy and efficiency requirements. The developed computational strategy is firmly rooted in general biophysical chemistry and is thus offered as a general tool for the quantification of interactions across relevant flexible protein-protein interfaces.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"3 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144065886","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chunfeng Li,Yizhuo Wang,Hongbo Xing,Yidan Wang,Yang Wang,Jiawei Ye
{"title":"Vina-CUDA: An Efficient Program with in-Depth Utilization of GPU to Accelerate Molecular Docking.","authors":"Chunfeng Li,Yizhuo Wang,Hongbo Xing,Yidan Wang,Yang Wang,Jiawei Ye","doi":"10.1021/acs.jcim.4c01933","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c01933","url":null,"abstract":"As a mainstream technology in modern drug discovery, molecular docking methodologies enable precise and efficient identification of lead compounds within large chemical repositories to improve drug development efficiency and reduce costs. The exponential growth of chemical databases has substantially expanded drug discovery resources while improving the identification rates of true positives in lead compounds. However, this rapid expansion poses significant challenges for existing docking tools to efficiently screen lead compounds from these massive chemical libraries. In this study, we proposed Vina-CUDA, which leverages GPU hardware features to optimize and accelerate the core algorithm of the popular tool AutoDock Vina at three aspects, computational capability, memory access, and resource utilization, significantly improving docking efficiency. A hybrid parallel optimization strategy integrating task and computational parallelism was implemented, accompanied by systematic code and data structure optimization, to maximize GPU resource utilization and enhance computational efficiency. Building upon this, we developed its derivatives, QuickVina2-CUDA and QuickVina-W-CUDA, as well as a user-friendly multi-GPU docking framework to utilize multi-GPU resources to accelerate large-scale virtual screening tasks. The performance and docking accuracy of Vina-CUDA and its derivatives were evaluated under five chemical databases. Results showed that, compared to baseline programs, Vina-CUDA with RILC-BFGS optimization algorithm achieved average and maximum accelerations of 3.71× and 6.89× across five databases, while QuickVina2-CUDA and QuickVina-W-CUDA achieved average speedups of 6.19× and 1.46×, respectively, without compromising docking accuracy. Furthermore, Vina-CUDA and its derivatives demonstrated comparable performance to baseline docking programs in docking, scoring, and ranking power, with excellent scalability and portability.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"124 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144065889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ashkan Fakharzadeh,Mahmoud Moradi,Celeste Sagui,Christopher Roland
{"title":"Comparative Study of the Bending Free Energies of C- and G-Based DNA: A-, B-, and Z-DNA and Associated Mismatched Trinucleotide Repeats.","authors":"Ashkan Fakharzadeh,Mahmoud Moradi,Celeste Sagui,Christopher Roland","doi":"10.1021/acs.jcim.5c00541","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c00541","url":null,"abstract":"DNA's structural flexibility plays a crucial role in various biological functions such as gene replication, repair, and regulation as well as DNA-protein recognition. We investigate the bending free energy of short DNA helices, including d(5'-(CG)7C-3')2 in A-, B-, and Z-forms, and C- and G-rich trinucleotide repeat helices, using orientation quaternions with enhanced sampling methods. The orientation quaternion technique provides an effective method to induce rotational transformations or to restrain the orientation of certain domains of biomolecular systems. This methodology was implemented in the AMBER simulation package and used to induce DNA bending in two separate ways: free bending and directional bending. We found that the bending free energy varies quadratically for moderate bending and then becomes almost linear for larger bending angles. The left-handed Z-DNA helix was found to exhibit the highest rigidity among the canonical DNA forms studied. The mechanisms associated with bending were also investigated with evidence for type I and type II kinks depending on the sequence and the helical form considered. The duplexes exhibit high flexibility in the presence of CC and GG mismatches, particularly CGG and GGC trinucleotide repeats in the Z-form, which have the lowest bending free energies. These calculations provide new insight into the mechanics of the global conformational flexibility of DNA molecules by quantifying the energetic cost and preferred directions of bending.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"28 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144065888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A High-Throughput Computational Protocol for Tuning Molecular Properties: Application to ESIPT Chromophores.","authors":"Isabella C D Merritt,Frédéric Castet","doi":"10.1021/acs.jcim.5c00692","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c00692","url":null,"abstract":"Over the past decade, improvements in computing power and theoretical approaches have enabled high-throughput computational investigations of systems. In this work, we present the development of a simple automated computational protocol for the study of molecular substitutions to known molecules, which minimizes human error and effort while capitalizing on existing calculations to optimize computational cost. We demonstrate the use of our protocol on three test-cases of known chromophores undergoing intramolecular proton transfer: (1) a focused study of 12 molecular derivatives, where the protocol is run locally on a standard laptop, (2) a larger study of 169 derivatives, which allows for investigation of trends influencing ground- and excited-state reactivity, and (3) a realistic study of how our protocol could be used to investigate a large (over 700) set of derivative molecules and select candidates fulfilling required criteria for a particular given application. Our protocol, available online, is designed to be user-friendly and lightweight and enables efficient and straightforward screening of hundreds of molecular derivatives.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"50 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144065891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"GMAMDA: Predicting Metabolite-Disease Associations Based on Adaptive Hardness Negative Sampling and Adaptive Graph Multiple Convolution.","authors":"Binglu Hu,Ying Su,Xuecong Tian,Chen Chen,Cheng Chen,Xiaoyi Lv","doi":"10.1021/acs.jcim.5c00694","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c00694","url":null,"abstract":"Metabolites are small molecules produced during organism metabolism, with their abnormal concentrations closely linked to the onset and progression of various diseases. Accurate prediction of metabolite-disease associations is crucial for early diagnosis, mechanistic exploration, and treatment optimization. However, existing algorithms often overlook the integration of node features and neglect the impact of different hop domains on nodes in the processing of heterogeneous graphs. Furthermore, current methods solely rely on random sampling for selecting negative samples without considering their reliability, thereby compromising model stability. A novel metabolite-disease association prediction model, GMAMDA, is proposed to address these challenges. GMAMDA integrates adaptive hardness negative sampling, adaptive graph multiple convolution techniques, and a multiheterogeneous graph fusion strategy to forecast potential metabolite-disease associations. Initially, by computing multisource similarity information for metabolites and diseases, multiple heterogeneous graph networks are established for metabolite-disease association networks. Subsequently, the adaptive graph's multiconvolution mechanism is employed to generate feature-rich node representations across various heterogeneous graphs by dynamically leveraging information from different hop neighborhoods. The model then utilizes an adaptive hardness negative sampling approach based on principal component analysis to select negative samples with the highest information content for training, enabling the prediction of potential associations between new metabolites and diseases. Experimental findings demonstrate that GMAMDA outperforms state-of-the-art methods across various evaluation metrics, including AUC (0.9962 ± 0.0014), AUPR (0.9967 ± 0.0009), and accuracy (0.9733 ± 0.0042). Case studies focusing on Alzheimer's disease and kidney disease further validate GMAMDA's clinical potential in predicting metabolite markers.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"54 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144065893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A View on Molecular Complexity from the GDB Chemical Space.","authors":"Ye Buehler,Jean-Louis Reymond","doi":"10.1021/acs.jcim.5c00334","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c00334","url":null,"abstract":"One recurring question when choosing which molecules to select for investigation is that of molecular complexity: is there a price to pay for complexity in terms of synthesis difficulty, and does complexity have anything to do with biological properties? In the chemical space of small organic molecules enumerated from mathematical graphs in the GDBs (Generated DataBases), most compounds are too complex and challenging for synthesis despite containing only standard functional groups and ring types. For these GDB molecules, we find that an increasing fraction (MC1) or number (MC2) of non-divalent nodes in the molecular graph represent simple measures of molecular complexity, which we interpret in terms of potential synthesis difficulties. We also show that MC1 and MC2 are applicable to commercial screening compounds (ZINC), bioactive molecules (ChEMBL) and natural products (COCONUT) and compare them with previously reported measures of molecular complexity and synthetic accessibility.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"15 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144065892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}