{"title":"An Integrated Fuzzy Neural Network and Topological Data Analysis for Molecular Graph Representation Learning and Property Forecasting.","authors":"Phu Pham","doi":"10.1002/minf.202400335","DOIUrl":"10.1002/minf.202400335","url":null,"abstract":"<p><p>Within a recent decade, graph neural network (GNN) has emerged as a powerful neural architecture for various graph-structured data modelling and task-driven representation learning problems. Recent studies have highlighted the remarkable capabilities of GNNs in handling complex graph representation learning tasks, achieving state-of-the-art results in node/graph classification, regression, and generation. However, most traditional GNN-based architectures like GCN and GraphSAGE still faced several challenges related to the capability of preserving the multi-scaled topological structures. These models primarily focus on capturing local neighborhood information, often failing to retain global structural features essential for graph-level representation and classification tasks. Furthermore, their expressiveness is limited when learning topological structures in complex molecular graph datasets. To overcome these limitations, in this paper, we proposed a novel graph neural architecture which is an integration between neuro-fuzzy network and topological graph learning approach, naming as: FTPG. Specifically, within our proposed FTPG model, we introduce a novel approach to molecular graph representation and property prediction by integrating multi-scaled topological graph learning with advanced neural components. The architecture employs separate graph neural learning modules to effectively capture both local graph-based structures as well as global topological features. Moreover, to further address feature uncertainty in the global-view representation, a multi-layered neuro-fuzzy network is incorporated within our model to enhance the robustness and expressiveness of the learned molecular graph embeddings. This combinatorial approach can assist to leverage the strengths of multi-view and multi-modal neural learning, enabling FTPG to deliver superior performance in molecular graph tasks. Extensive experiments on real-world/benchmark molecular datasets demonstrate the effectiveness of our proposed FTPG model. It consistently outperforms state-of-the-art GNN-based baselines categorized in different approaches, including canonical local proximity message passing based, graph transformer-based, and topology-driven approaches.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":"44 3","pages":"e202400335"},"PeriodicalIF":2.8,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143616256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Aseel Yasin Matrouk, Haneen Mohammad, Safa Daoud, Mutasem Omar Taha
{"title":"Discovery of New HER2 Inhibitors via Computational Docking, Pharmacophore Modeling, and Machine Learning.","authors":"Aseel Yasin Matrouk, Haneen Mohammad, Safa Daoud, Mutasem Omar Taha","doi":"10.1002/minf.202400336","DOIUrl":"10.1002/minf.202400336","url":null,"abstract":"<p><p>The human epidermal growth factor receptor 2 (HER2) is a critical oncogene implicated in the development of various aggressive cancers, particularly breast cancer. Discovering novel HER2 inhibitors is crucial for expanding therapeutic options for HER2-related malignancies. In this study, we present a computational workflow that focuses on generating pharmacophores derived from docked poses of a selected list of 15 diverse, potent HER2 inhibitors, utilizing flexible docking. The resulting pharmacophores, along with other physicochemical molecular descriptors, were then evaluated in a machine learning-quantitative structure-activity relationship (ML-QSAR) analysis against 1,272 HER2 inhibitors. Several machine learning methods were assessed, and a genetic function algorithm (GFA) was employed for feature selection. Ultimately, GFA combined with Bagging and J48Graft classifiers produced the best self-consistent and predictive models. These models highlighted the significance of two pharmacophores, Hypo_1 and Hypo_2, in distinguishing potent from less active inhibitors. The successful ML-QSAR models and their associated pharmacophores were used to screen the National Cancer Institute (NCI) database for novel HER2 inhibitors. Three promising anti-HER2 leads were identified, with the top-performing lead demonstrating an experimental anti-HER2 IC<sub>50</sub> value of 3.85 μM. Notably, the three inhibitors exhibited distinct chemical scaffolds compared to existing HER2 inhibitors, as indicated by principal component analysis.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":"44 2","pages":"e202400336"},"PeriodicalIF":2.8,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143458679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MAYA (Multiple ActivitY Analyzer): An Open Access Tool to Explore Structure-Multiple Activity Relationships in the Chemical Universe.","authors":"J Israel Espinoza-Castañeda, José L Medina-Franco","doi":"10.1002/minf.202400306","DOIUrl":"10.1002/minf.202400306","url":null,"abstract":"<p><p>Herein, we introduce MAYA (Multiple Activity Analyzer), a tool designed to automatically construct a chemical multiverse, generating multiple visualizations of chemical spaces of a compound data set described by structural descriptors of different nature such as Molecular ACCess Systems (MACCS) keys, extended connectivity fingerprints with different radius, molecular descriptors with pharmaceutical relevance, and bioactivity descriptors. These representations are integrated with various data visualization techniques for the automated analysis focused on structure - multiple activity/property relationships, enabling analysis for various problems set in user-friendly source software. The source code of MAYA is freely available on GitHub at https://github.com/IsrC11/MAYA.git.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":"44 2","pages":"e202400306"},"PeriodicalIF":2.8,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11812492/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143391311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kenji Hori, Yujiro Matsuo, Toru Yamaguchi, Kimito Funatsu
{"title":"An Attempt to Classify Elementary Reactions on the Basis of TS Motifs.","authors":"Kenji Hori, Yujiro Matsuo, Toru Yamaguchi, Kimito Funatsu","doi":"10.1002/minf.202400040","DOIUrl":"10.1002/minf.202400040","url":null,"abstract":"<p><p>Reactions commonly used in synthetic organic chemistry are named after their discoverers or developers. They are called the name reactions and generally consist of several elementary reactions. Quantum chemical calculations can optimize transition state (TS) structures of the elementary reactions. The geometrical feature of TS is called TS motif. We have constructed a database (QMRDB) with the TS motif information and have been continuing to accumulate them. In the present study, we extracted 102 elementary reactions from the QMRDB and attempted to classify them using the Kohonen self-organization map. As the results, all the TS motifs were clustered. By firing a target compound on a Kohonen map generated, we expect to be able to easily find the TS motifs most similar to the target.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":"44 2","pages":"e202400040"},"PeriodicalIF":2.8,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11833755/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143440926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Massina Abderrahmane, Hamza Tajmouati, Vinicius Barros Ribeiro da Silva, Quentin Perron
{"title":"Predicting the Price of Molecules Using Their Predicted Synthetic Pathways.","authors":"Massina Abderrahmane, Hamza Tajmouati, Vinicius Barros Ribeiro da Silva, Quentin Perron","doi":"10.1002/minf.202400039","DOIUrl":"10.1002/minf.202400039","url":null,"abstract":"<p><p>Currently, numerous metrics allow chemists and computational chemists to refine and filter libraries of virtual molecules in order to prioritize their synthesis. Some of the most commonly used metrics and models are QSAR models, docking scores, diverse druggability metrics, and synthetic feasibility scores to name only a few. To our knowledge, among the known metrics, a function which estimates the price of a novel virtual molecule and which takes into account the availability and price of starting materials has not been considered before in literature. Being able to make such a prediction could improve and accelerate the decision-making process related to the cost-of-goods. Taking advantage of recent advances in the field of Computer Aided Synthetic Planning (CASP), we decided to investigate if the predicted retrosynthetic pathways of a given molecule and the prices of its associated starting materials could be good features to predict the price of that compound. In this work, we present a deep learning model, RetroPriceNet, that predicts the price of molecules using their predicted synthetic pathways. On a holdout test set, the model achieves better performance than the state-of-the-art model. The developed approach takes into account the synthetic feasibility of molecules and the availability and prices of the starting materials.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":"44 2","pages":"e202400039"},"PeriodicalIF":2.8,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143066819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dorsa Dadashi, Marjan Kaedi, Parsa Dadashi, Suprakas Sinha Ray
{"title":"Prediction of the Appropriate Temperature and Pressure for Polymer Dissolution Using Machine Learning Models.","authors":"Dorsa Dadashi, Marjan Kaedi, Parsa Dadashi, Suprakas Sinha Ray","doi":"10.1002/minf.202400193","DOIUrl":"10.1002/minf.202400193","url":null,"abstract":"<p><p>The widespread use of polymer solutions in the chemical industry poses a significant challenge in determining optimal dissolution conditions. Traditionally, researchers have relied on experimental methods to estimate the processing parameters needed to dissolve polymers, often requiring numerous iterations of testing different temperatures and pressures. This approach is both costly and time-consuming. In this study, for the first time, we present a machine learning-based approach to predict the minimum temperature and pressure required for polymer dissolution, correlating molecular weight and chemical structure of both the polymer and solvent and its weight percent. Using a dataset compiled from existing literature, which includes key factors influencing polymer dissolution, we also extracted chemical bond information from the molecular structures of polymer-solvent systems. Six different machine learning algorithms, including linear regression, k-nearest neighbors, regression trees, random forests, multilayer perceptron neural networks, and support vector regression, were employed to develop predictive models. Among these, the Random Forest model achieved the highest accuracy, with R<sup>2</sup> values of 0.931 and 0.942 for temperature and pressure predictions, respectively. This novel approach eliminates the need for repetitive experimental testing, offering a more efficient pathway to determining dissolution conditions.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":"44 2","pages":"e202400193"},"PeriodicalIF":2.8,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143391324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"KNIME Workflows for Chemoinformatic Characterization of Chemical Databases.","authors":"Carlos D Ramírez-Márquez, José L Medina-Franco","doi":"10.1002/minf.202400337","DOIUrl":"10.1002/minf.202400337","url":null,"abstract":"<p><p>In chemoinformatics, chemical databases have great importance since their main objective is to store and organize the chemical structures of molecules and their properties, from basic information such as chemical structure to more complex like molecular fingerprints or other types of calculated or experimental descriptors and biological activity. However, this data can only be utilized in projects to identify novel therapeutic molecules or other fields through their correct characterization and analysis. In this Application Note, we compiled five workflows within the open-source data analytics and visualization platform KNIME that can be implemented for the chemoinformatic characterization of databases. To illustrate the application of the workflows, we used BIOFACQUIM, a compound database of natural products isolated and characterized in Mexico [1].</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":"44 2","pages":"e202400337"},"PeriodicalIF":2.8,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143365158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exploration of the Global Minimum and Conical Intersection with Bayesian Optimization.","authors":"Riho Somaki, Taichi Inagaki, Miho Hatanaka","doi":"10.1002/minf.202400041","DOIUrl":"10.1002/minf.202400041","url":null,"abstract":"<p><p>Conventional molecular geometry searches on a potential energy surface (PES) utilize energy gradients from quantum chemical calculations. However, replacing energy calculations with noisy quantum computer measurements generates errors in the energies, which makes geometry optimization using the energy gradient difficult. One gradient-free optimization method that can potentially solve this problem is Bayesian optimization (BO). To use BO in geometry search, an acquisition function (AF), which involves an objective variable, must be defined suitably. In this study, we propose a strategy for geometry searches using BO and examine the appropriate AFs to explore two critical structures: the global minimum (GM) on the singlet ground state (S<sub>0</sub>) and the most stable conical intersection (CI) point between S<sub>0</sub> and the singlet excited state. We applied our strategy to two molecules and located the GM and the most stable CI geometries with high accuracy for both molecules. We also succeeded in the geometry searches even when artificial random noises were added to the energies to simulate geometry optimization using noisy quantum computer measurements.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":"44 2","pages":"e202400041"},"PeriodicalIF":2.8,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11781018/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143066818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Molecular InformaticsPub Date : 2025-01-01Epub Date: 2024-12-05DOI: 10.1002/minf.202400305
Gabriel Corrêa Veríssimo, Rafaela Salgado Ferreira, Vinícius Gonçalves Maltarollo
{"title":"Ultra-Large Virtual Screening: Definition, Recent Advances, and Challenges in Drug Design.","authors":"Gabriel Corrêa Veríssimo, Rafaela Salgado Ferreira, Vinícius Gonçalves Maltarollo","doi":"10.1002/minf.202400305","DOIUrl":"10.1002/minf.202400305","url":null,"abstract":"<p><p>Virtual screening (VS) in drug design employs computational methodologies to systematically rank molecules from a virtual compound library based on predicted features related to their biological activities or chemical properties. The recent expansion in commercially accessible compound libraries and the advancements in artificial intelligence (AI) and computational power - including enhanced central processing units (CPUs), graphics processing units (GPUs), high-performance computing (HPC), and cloud computing - have significantly expanded our capacity to screen libraries containing over 10<sup>9</sup> molecules. Herein, we review the concept of ultra-large virtual screening (ULVS), focusing on the various algorithms and methodologies employed for virtual screening at this scale. In this context, we present the software utilized, applications, and results of different approaches, such as brute force docking, reaction-based docking approaches, machine learning (ML) strategies applied to docking or other VS methods, and similarity/pharmacophore search-based techniques. These examples represent a paradigm shift in the drug discovery process, demonstrating not only the feasibility of billion-scale compound screening but also their potential to identify hit candidates and increase the structural diversity of novel compounds with biological activities.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":" ","pages":"e202400305"},"PeriodicalIF":2.8,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142780630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
David F Nippa, Alex T Müller, Kenneth Atz, David B Konrad, Uwe Grether, Rainer E Martin, Gisbert Schneider
{"title":"Simple User-Friendly Reaction Format.","authors":"David F Nippa, Alex T Müller, Kenneth Atz, David B Konrad, Uwe Grether, Rainer E Martin, Gisbert Schneider","doi":"10.1002/minf.202400361","DOIUrl":"10.1002/minf.202400361","url":null,"abstract":"<p><p>Utilizing the growing wealth of chemical reaction data can boost synthesis planning and increase success rates. Yet, the effectiveness of machine learning tools for retrosynthesis planning and forward reaction prediction relies on accessible, well-curated data presented in a structured format. Although some public and licensed reaction databases exist, they often lack essential information about reaction conditions. To address this issue and promote the principles of findable, accessible, interoperable, and reusable (FAIR) data reporting and sharing, we introduce the Simple User-Friendly Reaction Format (SURF). SURF standardizes the documentation of reaction data through a structured tabular format, requiring only a basic understanding of spreadsheets. This format enables chemists to record the synthesis of molecules in a format that is understandable by both humans and machines, which facilitates seamless sharing and integration directly into machine learning pipelines. SURF files are designed to be interoperable, easily imported into relational databases, and convertible into other formats. This complements existing initiatives like the Open Reaction Database (ORD) and Unified Data Model (UDM). At Roche, SURF plays a crucial role in democratizing FAIR reaction data sharing and expediting the chemical synthesis process.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":"44 1","pages":"e202400361"},"PeriodicalIF":2.8,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11755691/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143024131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}