{"title":"scMDCL: A Deep Collaborative Contrastive Learning Framework for Matched Single-Cell Multiomics Data Clustering","authors":"Wenhao Wu, Shudong Wang*, Kuijie Zhang, Hengxiao Li, Sibo Qiao, Yuanyuan Zhang and Shanchen Pang, ","doi":"10.1021/acs.jcim.4c0211410.1021/acs.jcim.4c02114","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c02114https://doi.org/10.1021/acs.jcim.4c02114","url":null,"abstract":"<p >Single-cell multiomics clustering integrates multiple omics data to analyze cellular heterogeneity and is crucial for uncovering complex biological processes and disease mechanisms. However, existing matched single-cell multiomics clustering methods often neglect the full utilization of intercellular relationships and the interactions and synergy between features from different omics, leading to suboptimal clustering performance. In this paper, we propose a deep collaborative contrastive learning framework for matched single-cell multiomics data clustering, named scMDCL. This framework fully leverages intercell relationships while enhancing feature interactions among identical cells across different omics data, thereby facilitating efficient clustering of multiomics data. Specifically, to fully utilize the topological information between cells, a graph autoencoder and a feature information enhancement module are designed for different omics, enabling the extraction and augmentation of cell features. Additionally, contrastive learning techniques are employed to strengthen the interactions among the different omics features of the same cell. Ultimately, multiomics deep collaborative clustering modules are utilized to achieve single-cell multiomics clustering. Extensive experiments conducted on nine publicly available single-cell multiomics datasets demonstrate the superior performance of the proposed framework in integrating multiomics data for clustering tasks.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"65 6","pages":"3048–3063 3048–3063"},"PeriodicalIF":5.6,"publicationDate":"2025-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143675749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yu Li*, Lin-Xuan Hou, Hai-Cheng Yi, Zhu-Hong You*, Shi-Hong Chen, Jia Zheng, Yang Yuan and Cheng-Gang Mi,
{"title":"MOLGAECL: Molecular Graph Contrastive Learning via Graph Auto-Encoder Pretraining and Fine-Tuning Based on Drug–Drug Interaction Prediction","authors":"Yu Li*, Lin-Xuan Hou, Hai-Cheng Yi, Zhu-Hong You*, Shi-Hong Chen, Jia Zheng, Yang Yuan and Cheng-Gang Mi, ","doi":"10.1021/acs.jcim.5c0004310.1021/acs.jcim.5c00043","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c00043https://doi.org/10.1021/acs.jcim.5c00043","url":null,"abstract":"<p >Drug-drug interactions influence drug efficacy and patient prognosis, providing substantial research value. Some existing methods struggle with the challenges posed by sparse networks or lack the capability to integrate data from multiple sources. In this study, we propose MOLGAECL, a novel approach based on graph autoencoder pretraining and molecular graph contrastive learning. Initially, a large number of unlabeled molecular graphs are pretrained using a graph autoencoder, where graph contrastive learning is applied for more accurate representation of the drugs. Subsequently, a full-parameter fine-tuning is performed on different data sets to adapt the model for drug interaction-related prediction tasks. To assess the effectiveness of MOLGAECL, comparison experiments with state-of-the-art methods, fine-tuning comparison experiments, and parameter sensitivity analysis are conducted. Extensive experimental results demonstrate the superior performance of MOLGAECL. Specifically, MOLGAECL achieves an average increase of 6.13% in accuracy, 6.14% in AUROC, and 8.16% in AUPRC across all data sets.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"65 6","pages":"3104–3116 3104–3116"},"PeriodicalIF":5.6,"publicationDate":"2025-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143675737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CMDmpnn: Combining Comparative Molecular Dynamics and ProteinMPNN to Rapidly Expand Enzyme Substrate Spectrum","authors":"Chuan-qi Sun, Zhi-min Li*, Yu Ji*, Ulrich Schwaneberg and Zong-lin Li*, ","doi":"10.1021/acs.jcim.5c0011710.1021/acs.jcim.5c00117","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c00117https://doi.org/10.1021/acs.jcim.5c00117","url":null,"abstract":"<p >Expanding enzyme substrate spectra enhances industrial applications and drives sustainable biocatalysis. Despite advances, challenges in modification efficiency and high-throughput screening persist. Here, we developed a virtual screening method called CMDmpnn that combines comparative molecular dynamics (MD) simulations and ProteinMPNN to broaden enzyme substrate spectra without compromising other industrially important properties of enzymes, such as thermostability. Using glycosyltransferase as a model, we first established a dynamic model library of the wild-type enzyme through MD simulations and performed clustering. Subsequently, we utilized ProteinMPNN to generate a comprehensive set of new sequences for the entire library, enabling rapid identification of all possible enzyme variants. Short MD simulations were then conducted on variant–substrate complex models, with results compared to those of the wild-type enzyme. By analyzing catalytically relevant information such as substrate binding modes and key atomic distances, we identified multiple variants capable of catalyzing a broad spectrum of phenolic compounds, all within a timeframe of less than 2 weeks. The CMDmpnn method offers a powerful and efficient tool for rapidly expanding enzyme substrate spectra.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"65 6","pages":"2741–2747 2741–2747"},"PeriodicalIF":5.6,"publicationDate":"2025-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143675819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wenxiang Song, Ren Peng, Hongbo Yu, Meiling Zhan, Guixia Liu, Weihua Li, Guobin Ren, Bin Zhu* and Yun Tang*,
{"title":"Cocry-pred: A Dynamic Resource Propagation Method for Cocrystal Prediction","authors":"Wenxiang Song, Ren Peng, Hongbo Yu, Meiling Zhan, Guixia Liu, Weihua Li, Guobin Ren, Bin Zhu* and Yun Tang*, ","doi":"10.1021/acs.jcim.5c0017910.1021/acs.jcim.5c00179","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c00179https://doi.org/10.1021/acs.jcim.5c00179","url":null,"abstract":"<p >Drug cocrystallization is a powerful strategy to enhance drug properties by modifying their physicochemical characteristics without altering their chemical structure. However, the identification of suitable coformers remains a challenging and resource-intensive task. To streamline this process, we developed a novel cocrystal prediction model, Cocry-pred, which utilizes the Network-Based Inference (NBI) algorithm─a dynamic resource propagation method─to recommend coformers for target molecules based on topological data from cocrystal network and molecular substructure information. We evaluated the impact of 13 types of molecular fingerprints and different numbers of propagation rounds on model performance. Additionally, to achieve optimal performance, we introduced three key hyperparameters─α (node weights), β (edge weights) and γ (penalty for high-degree nodes)─to balance the influence of various factors within the composite network. The best performance of Cocry-pred achieved an impressive AUC of 0.885 and an RS of 0.108. To validate the reliability of the model, we employed it to predict potential coformers for Apatinib. Subsequently, seven Apatinib cocrystals were then synthesized experimentally, among which single-crystal structures were obtained for two cocrystals. This advancement highlights the potential of Cocry-pred as a powerful tool, offering significant improvements in efficiency and providing valuable insights for cocrystal screening and design.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"65 6","pages":"2868–2881 2868–2881"},"PeriodicalIF":5.6,"publicationDate":"2025-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143675738","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
So Eun Choi*, MiYoung Jang, SoHee Yoon, SangHyun Yoo, Jooyeon Ahn, Minho Kim, Ho-Gyeong Kim, Yebin Jung, Seongeon Park, Young-Seok Kim and Taekhoon Kim,
{"title":"LLM-Driven Synthesis Planning for Quantum Dot Materials Development","authors":"So Eun Choi*, MiYoung Jang, SoHee Yoon, SangHyun Yoo, Jooyeon Ahn, Minho Kim, Ho-Gyeong Kim, Yebin Jung, Seongeon Park, Young-Seok Kim and Taekhoon Kim, ","doi":"10.1021/acs.jcim.4c0152910.1021/acs.jcim.4c01529","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c01529https://doi.org/10.1021/acs.jcim.4c01529","url":null,"abstract":"<p >The application of large language models in materials science has opened new avenues for accelerating materials development. Building on this advancement, we propose a novel framework leveraging large language models to optimize experimental procedures for synthesizing quantum dot materials with multiple desired properties. Our framework integrates the synthesis protocol generation model and the property prediction model, both fine-tuned on open-source large language models using parameter-efficient training techniques with in-house synthesis protocol data. Once the synthesis protocol with target properties and a masked reference protocol is generated, it undergoes validation through the property prediction models, followed by assessments of its novelty and human evaluation. Our synthesis experiments demonstrate that among the six synthesis protocols derived from the entire framework, three successfully update the Pareto front, and all six improve at least one property. Through empirical validation, we confirm the effectiveness of our fine-tuned large language model-driven framework for synthesis planning, showcasing strong performance under multitarget optimization.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"65 6","pages":"2748–2758 2748–2758"},"PeriodicalIF":5.6,"publicationDate":"2025-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143675818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Large Model Era: Deep Learning in Osteoporosis Drug Discovery.","authors":"Junlin Xu, Xiaobo Wen, Li Sun, Kunyue Xing, Linyuan Xue, Sha Zhou, Jiayi Hu, Zhijuan Ai, Qian Kong, Zishu Wen, Li Guo, Minglu Hao, Dongming Xing","doi":"10.1021/acs.jcim.4c02264","DOIUrl":"10.1021/acs.jcim.4c02264","url":null,"abstract":"<p><p>Osteoporosis is a systemic microstructural degradation of bone tissue, often accompanied by fractures, pain, and other complications, resulting in a decline in patients' life quality. In response to the increased incidence of osteoporosis, related drug discovery has attracted more and more attention, but it is often faced with challenges due to long development cycle and high cost. Deep learning with powerful data processing capabilities has shown significant advantages in the field of drug discovery. With the development of technology, it is more and more applied to all stages of drug discovery. In particular, large models, which have been developed rapidly recently, provide new methods for understanding disease mechanisms and promoting drug discovery because of their large parameters and ability to deal with complex tasks. This review introduces the traditional models and large models in the deep learning domain, systematically summarizes their applications in each stage of drug discovery, and analyzes their application prospect in osteoporosis drug discovery. Finally, the advantages and limitations of large models are discussed in depth, in order to help future drug discovery.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":"2232-2244"},"PeriodicalIF":5.6,"publicationDate":"2025-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143497544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ali Ebrahimpoor Gorji, Juho-Pekka Laakso, Ville Alopaeus, Petri Uusi-Kyyny
{"title":"MLR Data-Driven for the Prediction of Infinite Dilution Activity Coefficient of Water in Ionic Liquids (ILs) Using QSPR-Based COSMO Descriptors.","authors":"Ali Ebrahimpoor Gorji, Juho-Pekka Laakso, Ville Alopaeus, Petri Uusi-Kyyny","doi":"10.1021/acs.jcim.4c02095","DOIUrl":"10.1021/acs.jcim.4c02095","url":null,"abstract":"<p><p>To predict the partial molar excess enthalpy, entropy at infinite dilution, and phase equilibria, the availability of an infinite dilution activity coefficient is vital. The \"quantitative structure-activity/property relationship\" (QSAR/QSPR) approach has been used for the prediction of infinite dilution activity coefficient of water in ionic liquids using an extensive data set. The data set comprised 380 data points including 68 unique ILs at a wide range of temperatures, which is more extensive than previously published data sets. Moreover, new predictive QSAR/QSPR models including novel molecular descriptors, called \"COSMO-RS descriptors\", have been developed. Using two different techniques of external validation, the data set was divided to the training set for the development of models and to the validation set for external validation. Unlike former available models, internal validation using leave one/multi out-cross validations (LOO-CV/LMO-CV) and Y-scrambling methods were performed on the models using statistical parameters for further assessment. According to the obtained results of statistical parameters (<i>R</i><sup>2</sup> = 0.99 and <i>Q</i><sup>2</sup><sub>LOO-CV</sub> = 0.99), the predictive capability of the developed QSPR model was excellent for training set. Regarding the external validation, other statistical parameters such as AAD = 0.283 and AARD % = 30 were also satisfactory for the validation set. While the values of γ<sub>H<sub>2</sub></sub><sub>O</sub><sup>∞</sup> increase or decrease with increasing temperature, the QSAR/QSPR models based on the van't Hoff equation takes into account the negative and positive effects of temperature on the γ<sub>H<sub>2</sub></sub><sub>O</sub><sup>∞</sup> in ILs well, depending on the nature of ILs. It was also shown that γ<sub>H<sub>2</sub></sub><sub>O</sub><sup>∞</sup> in some new ILs which had not been experimentally studied before can be predicted using the QSPR model.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":"2530-2542"},"PeriodicalIF":5.6,"publicationDate":"2025-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11898046/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143466455","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Christian Fellinger, Thomas Seidel, Benjamin Merget, Klaus-Juergen Schleifer, Thierry Langer
{"title":"GRADE and X-GRADE: Unveiling Novel Protein-Ligand Interaction Fingerprints Based on GRAIL Scores.","authors":"Christian Fellinger, Thomas Seidel, Benjamin Merget, Klaus-Juergen Schleifer, Thierry Langer","doi":"10.1021/acs.jcim.4c01902","DOIUrl":"10.1021/acs.jcim.4c01902","url":null,"abstract":"<p><p>Nonbonding molecular interactions, such as hydrogen bonding, hydrophobic contacts, ionic interactions, etc., are at the heart of many biological processes, and their appropriate treatment is essential for the successful application of numerous computational drug design methods. This paper introduces GRADE, a novel interaction fingerprint (IFP) descriptor that quantifies these interactions using floating point values derived from GRAIL scores, encoding both the presence and quality of interactions. GRADE is available in two versions: a basic 35-element variant and an extended 177-element variant. Three case studies demonstrate GRADE's utility: (1) dimensionality reduction for visualizing the chemical space of protein-ligand complexes using Uniform Manifold Approximation and Projection (UMAP), showing competitive performance with complex descriptors; (2) binding affinity prediction, where GRADE achieved reasonable accuracy with minimal machine learning optimization; and (3) three-dimensional-quantitative structure-activity relationship (3D-QSAR) modeling for a specific protein target, where GRADE enhanced the performance of Morgan Fingerprints.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":"2456-2475"},"PeriodicalIF":5.6,"publicationDate":"2025-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11898076/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143466483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yu-Hong Liu, Hong-Quan Xu, Si-Si Zhu, Yan-Feng Hong, Xiu-Wen Li, Hong-Xiu Li, Jun-Peng Xiong, Huan Xiao, Jin-Hui Bu, Feng Zhu* and Lin Tao*,
{"title":"ASVirus: A Comprehensive Knowledgebase for the Viral Alternative Splicing","authors":"Yu-Hong Liu, Hong-Quan Xu, Si-Si Zhu, Yan-Feng Hong, Xiu-Wen Li, Hong-Xiu Li, Jun-Peng Xiong, Huan Xiao, Jin-Hui Bu, Feng Zhu* and Lin Tao*, ","doi":"10.1021/acs.jcim.4c0221410.1021/acs.jcim.4c02214","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c02214https://doi.org/10.1021/acs.jcim.4c02214","url":null,"abstract":"<p >Viruses are significant human pathogens responsible for pandemic outbreaks and seasonal epidemics. Viral infectious diseases impose a devastating global burden and have a profound impact on public health systems. During viral infections, alternative splicing (AS) plays a crucial role in regulating immune responses, altering the host’s cellular environment, expanding viral genetic material, and facilitating viral replication. As research on AS in viral infections expands, it is crucial to consolidate data on virus-related splicing changes to improve our understanding of these viruses and associated diseases. To address this need, we created ASVirus (https://bddg.hznu.edu.cn/asvirus/), a comprehensive database of virus-associated AS events and their regulatory factors. ASVirus uniquely combines high-confidence, experimentally validated splicing data and investigates upstream regulatory mechanisms through a gene-splicing factor interaction network. Its user-friendly web interface offers detailed information into AS events from various viral families and the resulting mis-splicing in host genes, aiding the exploration of novel viral infection mechanisms and the identification of critical therapeutic targets for viral diseases.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"65 6","pages":"2722–2729 2722–2729"},"PeriodicalIF":5.6,"publicationDate":"2025-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143675793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mengmeng Liu, Xialong Ni, J Ramanujam, Michal Brylinski
{"title":"EC2Vec: A Machine Learning Method to Embed Enzyme Commission (EC) Numbers into Vector Representations.","authors":"Mengmeng Liu, Xialong Ni, J Ramanujam, Michal Brylinski","doi":"10.1021/acs.jcim.4c02161","DOIUrl":"10.1021/acs.jcim.4c02161","url":null,"abstract":"<p><p>Enzyme commission (EC) numbers play a vital role in classifying enzymes and understanding their functions in enzyme-related research. Although accurate and informative encoding of EC numbers is essential for enhancing the effectiveness of machine learning applications, simple EC encoding approaches suffer from limitations such as false numerical order and high sparsity. To address these issues, we developed EC2Vec, a multimodal autoencoder that preserves the categorical nature of EC numbers and leverages their hierarchical relationships, resulting in more meaningful and informative representations. EC2Vec encodes each digit of the EC number as a categorical token and then processes these embeddings through a 1D convolutional layer to capture their relationships. Comprehensive benchmarking against a large collection of EC numbers indicates that EC2Vec outperforms simple encoding methods. The t-SNE visualization of EC2Vec embeddings revealed distinct clusters corresponding to different enzyme classes, demonstrating that the hierarchical structure of the EC numbers is effectively captured. In downstream machine learning applications, EC2Vec embeddings outperformed other EC encoding methods in the reaction-EC pair classification task, underscoring its robustness and utility for enzyme-related research and bioinformatics applications.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":"2173-2179"},"PeriodicalIF":5.6,"publicationDate":"2025-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11898066/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143466480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}