Molecular Informatics最新文献_第4页

Navigating a 1E+60 Chemical Space of Peptide/Peptoid Oligomers. 浏览肽/肽低聚物的 1E+60 化学空间。

IF 2.8 4区医学

Molecular Informatics Pub Date : 2025-01-01 Epub Date: 2024-10-10 DOI: 10.1002/minf.202400186

Markus Orsi, Jean-Louis Reymond

引用次数: 0

Active learning approaches in molecule pKi prediction. 分子 pKi 预测中的主动学习方法。

IF 2.8 4区医学

Molecular Informatics Pub Date : 2025-01-01 Epub Date: 2024-08-06 DOI: 10.1002/minf.202400154

I M Kashafutdinova, A Poyezzhayeva, T Gimadiev, T Madzhidov

{"title":"Active learning approaches in molecule pKi prediction.","authors":"I M Kashafutdinova, A Poyezzhayeva, T Gimadiev, T Madzhidov","doi":"10.1002/minf.202400154","DOIUrl":"10.1002/minf.202400154","url":null,"abstract":"During the early stages of drug design, identifying compounds with suitable bioactivities is crucial. Given the vast array of potential drug databases, it's feasible to assay only a limited subset of candidates. The optimal method for selecting the candidates, aiming to minimize the overall number of assays, involves an active learning (AL) approach. In this work, we benchmarked a range of AL strategies with two main objectives: (1) to identify a strategy that ensures high model performance and (2) to select molecules with desired properties using minimal assays. To evaluate the different AL strategies, we employed the simulated AL workflow based on \"virtual\" experiments. These experiments leveraged ChEMBL datasets, which come with known biological activity values for the molecules. Furthermore, for classification tasks, we proposed the hybrid selection strategy that unified both exploration and exploitation AL strategies into a single acquisition function, defined by parameters n and c. We have also shown that popular minimal margin and maximal variance selection approaches for exploration selection correspond to minimization of the hybrid acquisition function with n=1 and 2 respectively. The balance between the exploration and exploitation strategies can be adjusted using a coefficient (c), making the optimal strategy selection straightforward. The primary strength of the hybrid selection method lies in its adaptability; it offers the flexibility to adjust the criteria for molecule selection based on the specific task by modifying the value of the contribution coefficient. Our analysis revealed that, in regression tasks, AL strategies didn't succeed at ensuring high model performance, however, they were successful in selecting molecules with desired properties using minimal number of tests. In analogous experiments in classification tasks, exploration strategy and the hybrid selection function with a constant c<1 (for n=1) and c≤0.2 (for n=2) were effective in achieving the goal of constructing a high-performance predictive model using minimal data. When searching for molecules with desired properties, exploitation, and the hybrid function with c≥1 (n=1) and c≥0.7 (n=2) demonstrated efficiency identifying molecules in fewer iterations compared to random selection method. Notably, when the hybrid function was set to an intermediate coefficient value (c=0.7), it successfully addressed both tasks simultaneously.","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":" ","pages":"e202400154"},"PeriodicalIF":2.8,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141893849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Topology-Enhanced Multi-Viewed Contrastive Approach for Molecular Graph Representation Learning and Classification. 分子图表示学习与分类的拓扑增强多视图对比方法。

IF 2.8 4区医学

Molecular Informatics Pub Date : 2025-01-01 DOI: 10.1002/minf.202400252

Phu Pham

{"title":"A Topology-Enhanced Multi-Viewed Contrastive Approach for Molecular Graph Representation Learning and Classification.","authors":"Phu Pham","doi":"10.1002/minf.202400252","DOIUrl":"https://doi.org/10.1002/minf.202400252","url":null,"abstract":"In recent times, graph representation learning has been becoming a hot research topic which has attracted a lot of attention from researchers. Graph embeddings have diverse applications across fields such as information and social network analysis, bioinformatics and cheminformatics, natural language processing (NLP), and recommendation systems. Among the advanced deep learning (DL) based architectures used in graph representation learning, graph neural networks (GNNs) have emerged as the dominant and highly effective framework. The recent GNN-based methods have demonstrated state-of-the-art performance on complex supervised and unsupervised tasks at both the node and graph levels. In recent years, to enhance multi-view and structured graph representations, contrastive learning-based techniques have been developed, introducing models known as graph contrastive learning (GCL) models. These GCL approaches leverage unsupervised contrastive methods to capture multi-view graph representations by comparing node and graph embeddings, yielding significant improvements in both graph-level representations and task-specific applications, such as molecular embedding and classification. However, as most GCL techniques are primarily designed to focus on the explicit graph structure through GNN-based encoders, they often overlook critical topological insights that could be provided through topological data analysis (TDA). Given the promising research indicating that topological features can greatly benefit various graph learning tasks, we propose a novel topology-enhanced, multi-view graph contrastive learning model called TMGCL. Our TMGCL model is designed to capture and utilize both comprehensive multi-scale topological and global structural information from graphs. This enhanced representation capability positions TMGCL to directly support a range of applications, such as molecular classification, with improved accuracy and robustness. Extensive experiments within two real-world datasets proved the effectiveness and outperformance of our proposed TMGCL in comparing with state-of-the-art GNN/GCL-based baselines.","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":"44 1","pages":"e202400252"},"PeriodicalIF":2.8,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142951853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

The Chemical Space Spanned by Manually Curated Datasets of Natural and Synthetic Compounds with Activities against SARS-CoV-2. 人工编辑的具有抗 SARS-CoV-2 活性的天然和合成化合物数据集所跨越的化学空间。

IF 2.8 4区医学

Molecular Informatics Pub Date : 2025-01-01 Epub Date: 2024-11-23 DOI: 10.1002/minf.202400293

Jude Y Betow, Gemma Turon, Clovis S Metuge, Simeon Akame, Vanessa A Shu, Oyere T Ebob, Miquel Duran-Frigola, Fidele Ntie-Kang

{"title":"The Chemical Space Spanned by Manually Curated Datasets of Natural and Synthetic Compounds with Activities against SARS-CoV-2.","authors":"Jude Y Betow, Gemma Turon, Clovis S Metuge, Simeon Akame, Vanessa A Shu, Oyere T Ebob, Miquel Duran-Frigola, Fidele Ntie-Kang","doi":"10.1002/minf.202400293","DOIUrl":"10.1002/minf.202400293","url":null,"abstract":"Diseases caused by viruses are challenging to contain, as their outbreak and spread could be very sudden, compounded by rapid mutations, making the development of drugs and vaccines a continued endeavour that requires fast discovery and preparedness. Targeting viral infections with small molecules remains one of the treatment options to reduce transmission and the disease burden. A lesson learned from the recent coronavirus disease (COVID-19) is to collect ready-to-screen small molecule libraries in preparation for the next viral outbreak, and potentially find a clinical candidate before it becomes a pandemic. Public availability of diverse compound libraries, well annotated in terms of chemical structures and scaffolds, modes of action, and bioactivities are, therefore, crucial to ensure the participation of academic laboratories in these screening efforts, especially in resource-limited settings where synthesis, testing and computing capacity are scarce. Here, we demonstrate a low-resource approach to populate the chemical space of naturally occurring and synthetic small molecules that have shown in vitro and/or in vivo activities against the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and its target proteins. We have manually curated two datasets of small molecules (naturally occurring and synthetically derived) by reading and collecting (hand-curating) the published literature. Information from the literature reveals that a majority of the reported SARS-CoV-2 compounds act by inhibiting the main protease, while 25% of the compounds currently have no known target. Scaffold analysis and principal component analysis revealed that the most common scaffolds in the datasets are quite distinct. We then expanded the initially manually curated dataset of over 1200 compounds via an ultra-large scale 2D and 3D similarity search, obtaining an expanded collection of over 150 k purchasable compounds. The spanned chemical space significantly extends beyond that of a commercially available coronavirus library of more than 20 k small molecules and constitutes a good starting collection for virtual screening campaigns given its manageable size and proximity to hand-curated compounds.","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":" ","pages":"e202400293"},"PeriodicalIF":2.8,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142693295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

GCLmf: A Novel Molecular Graph Contrastive Learning Framework Based on Hard Negatives and Application in Toxicity Prediction. GCLmf：基于硬阴性的新型分子图对比学习框架及其在毒性预测中的应用

IF 2.8 4区医学

Molecular Informatics Pub Date : 2025-01-01 Epub Date: 2024-10-18 DOI: 10.1002/minf.202400169

Xinxin Yu, Yuanting Chen, Long Chen, Weihua Li, Yuhao Wang, Yun Tang, Guixia Liu

{"title":"GCLmf: A Novel Molecular Graph Contrastive Learning Framework Based on Hard Negatives and Application in Toxicity Prediction.","authors":"Xinxin Yu, Yuanting Chen, Long Chen, Weihua Li, Yuhao Wang, Yun Tang, Guixia Liu","doi":"10.1002/minf.202400169","DOIUrl":"10.1002/minf.202400169","url":null,"abstract":"In silico methods for prediction of chemical toxicity can decrease the cost and increase the efficiency in the early stage of drug discovery. However, due to low accessibility of sufficient and reliable toxicity data, constructing robust and accurate prediction models is challenging. Contrastive learning, a type of self-supervised learning, leverages large unlabeled data to obtain more expressive molecular representations, which can boost the prediction performance on downstream tasks. While molecular graph contrastive learning has gathered growing attentions, current models neglect the quality of negative data set. Here, we proposed a self-supervised pretraining deep learning framework named GCLmf. We first utilized molecular fragments that meet specific conditions as hard negative samples to boost the quality of the negative set and thus increase the difficulty of the proxy tasks during pre-training to learn informative representations. GCLmf has shown excellent predictive power on various molecular property benchmarks and demonstrates high performance in 33 toxicity tasks in comparison with multiple baselines. In addition, we further investigated the necessity of introducing hard negatives in model building and the impact of the proportion of hard negatives on the model.","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":" ","pages":"e202400169"},"PeriodicalIF":2.8,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142470301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Structural and Dynamic Assessment of Disease-Causing Mutations for the Carnitine Transporter OCTN2. 肉毒碱转运体OCTN2致病突变的结构和动态评估。

IF 3.1 4区医学

Molecular Informatics Pub Date : 2025-01-01 DOI: 10.1002/minf.202400002

Johannes Jokiel, Marcel Bermudez

引用次数: 0

Interpret Gaussian Process Models by Using Integrated Gradients. 利用综合梯度解释高斯过程模型

IF 2.8 4区医学

Molecular Informatics Pub Date : 2025-01-01 Epub Date: 2024-11-26 DOI: 10.1002/minf.202400051

Fan Zhang, Naoaki Ono, Shigehiko Kanaya

{"title":"Interpret Gaussian Process Models by Using Integrated Gradients.","authors":"Fan Zhang, Naoaki Ono, Shigehiko Kanaya","doi":"10.1002/minf.202400051","DOIUrl":"10.1002/minf.202400051","url":null,"abstract":"Gaussian process regression (GPR) is a nonparametric probabilistic model capable of computing not only the predicted mean but also the predicted standard deviation, which represents the confidence level of predictions. It offers great flexibility as it can be non-linearized by designing the kernel function, made robust against outliers by altering the likelihood function, and extended to classification models. Recently, models combining deep learning with GPR, such as Deep Kernel Learning GPR, have been proposed and reported to achieve higher accuracy than GPR. However, due to its nonparametric nature, GPR is challenging to interpret. While Explainable AI (XAI) methods like LIME or kernel SHAP can interpret the predicted mean, interpreting the predicted standard deviation remains difficult. In this study, we propose a novel method to interpret the prediction of GPR by evaluating the importance of explanatory variables. We have incorporated the GPR model with the Integrated Gradients (IG) method to assess the contribution of each feature to the prediction. By evaluating the standard deviation of the posterior distribution, we show that the IG approach provides a detailed decomposition of the predictive uncertainty, attributing it to the uncertainty in individual feature contributions. This methodology not only highlights the variables that are most influential in the prediction but also provides insights into the reliability of the model by quantifying the uncertainty associated with each feature. Through this, we can obtain a deeper understanding of the model's behavior and foster trust in its predictions, especially in domains where interpretability is as crucial as accuracy.","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":" ","pages":"e202400051"},"PeriodicalIF":2.8,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11695984/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142716611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Extended Activity Cliffs-Driven Approaches on Data Splitting for the Study of Bioactivity Machine Learning Predictions. 用于生物活性机器学习预测研究的数据分割扩展活动峭壁驱动方法。

IF 2.8 4区医学

Molecular Informatics Pub Date : 2025-01-01 Epub Date: 2024-11-18 DOI: 10.1002/minf.202400054

Kenneth López-Pérez, Ramón Alain Miranda-Quintana

引用次数: 0

From High Dimensions to Human Insight: Exploring Dimensionality Reduction for Chemical Space Visualization. 从高维到人类洞察：探索化学空间可视化的降维。

IF 2.8 4区医学

Molecular Informatics Pub Date : 2025-01-01 Epub Date: 2024-12-05 DOI: 10.1002/minf.202400265

Alexey A Orlov, Tagir N Akhmetshin, Dragos Horvath, Gilles Marcou, Alexandre Varnek

引用次数: 0

GDMol: Generative Double-Masking Self-Supervised Learning for Molecular Property Prediction. GDMol：用于分子特性预测的生成式双掩蔽自我监督学习。

IF 2.8 4区医学

Molecular Informatics Pub Date : 2025-01-01 Epub Date: 2024-10-24 DOI: 10.1002/minf.202400146

Yingxu Liu, Qing Fan, Chengcheng Xu, Xiangzhen Ning, Yu Wang, Yang Liu, Yu Xie, Yanmin Zhang, Yadong Chen, Haichun Liu

{"title":"GDMol: Generative Double-Masking Self-Supervised Learning for Molecular Property Prediction.","authors":"Yingxu Liu, Qing Fan, Chengcheng Xu, Xiangzhen Ning, Yu Wang, Yang Liu, Yu Xie, Yanmin Zhang, Yadong Chen, Haichun Liu","doi":"10.1002/minf.202400146","DOIUrl":"10.1002/minf.202400146","url":null,"abstract":"Background: Effective molecular feature representation is crucial for drug property prediction. Recent years have seen increased attention on graph neural networks (GNNs) that are pre-trained using self-supervised learning techniques, aiming to overcome the scarcity of labeled data in molecular property prediction. Traditional GNNs in self-supervised molecular property prediction typically perform a single masking operation on the nodes and edges of the input molecular graph, masking only local information and insufficient for thorough self-supervised training.Method: Hence, we propose a model for molecular property prediction based on generative double-masking self-supervised learning, termed as GDMol. This integrates generative learning into the self-supervised learning framework for latent representation, and applies a second round of masking to these latent representations, enabling the model to better capture global information and semantic knowledge of the molecules for a richer, more informative representation, thereby achieving more accurate and robust molecular property prediction.Results: Our experiments on 5 datasets demonstrated superior performance of GDMol in predicting molecular properties across different domains. Moreover, we used the masking operation to traverse through the gradient changes of each node, the magnitude and sign of which reflect the positive and negative contribution respectively of the local structure in the molecule to the prediction outcome. This in-depth interpretative analysis not only enhances the model's interpretability, but also provides more targeted insights and direction for optimizing drug molecules.Conclusions: In summary, this research offers novel insights on improving molecular property prediction tasks, and paves the way for further research on the application of generative learning and self-supervised learning in the field of chemistry.","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":" ","pages":"e202400146"},"PeriodicalIF":2.8,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142504416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0