Molecular InformaticsPub Date : 2024-12-01Epub Date: 2024-10-15DOI: 10.1002/minf.202400044
Gloria Geine Paendong, Soualihou Ngnamsie Njimbouom, Candra Zonyfar, Jeong-Dong Kim
{"title":"ERL-ProLiGraph: Enhanced representation learning on protein-ligand graph structured data for binding affinity prediction.","authors":"Gloria Geine Paendong, Soualihou Ngnamsie Njimbouom, Candra Zonyfar, Jeong-Dong Kim","doi":"10.1002/minf.202400044","DOIUrl":"10.1002/minf.202400044","url":null,"abstract":"<p><p>Predicting Protein-Ligand Binding Affinity (PLBA) is pivotal in drug development, as accurate estimations of PLBA expedite the identification of promising drug candidates for specific targets, thereby accelerating the drug discovery process. Despite substantial advancements in PLBA prediction, developing an efficient and more accurate method remains non-trivial. Unlike previous computer-aid PLBA studies which primarily using ligand SMILES and protein sequences represented as strings, this research introduces a Deep Learning-based method, the Enhanced Representation Learning on Protein-Ligand Graph Structured data for Binding Affinity Prediction (ERL-ProLiGraph). The unique aspect of this method is the use of graph representations for both proteins and ligands, intending to learn structural information continued from both to enhance the accuracy of PLBA predictions. In these graphs, nodes represent atomic structures, while edges depict chemical bonds and spatial relationship. The proposed model, leveraging deep-learning algorithms, effectively learns to correlate these graphical representations with binding affinities. This graph-based representations approach enhances the model's ability to capture the complex molecular interactions critical in PLBA. This work represents a promising advancement in computational techniques for protein-ligand binding prediction, offering a potential path toward more efficient and accurate predictions in drug development. Comparative analysis indicates that the proposed ERL-ProLiGraph outperforms previous models, showcasing notable efficacy and providing a more suitable approach for accurate PLBA predictions.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":" ","pages":"e202400044"},"PeriodicalIF":2.8,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11639045/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142470300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Molecular InformaticsPub Date : 2024-12-01Epub Date: 2024-08-22DOI: 10.1002/minf.202400114
Mykola V Protopopov, Valentyna V Tararina, Fanny Bonachera, Igor M Dzyuba, Anna Kapeliukha, Serhii Hlotov, Oleksii Chuk, Gilles Marcou, Olga Klimchuk, Dragos Horvath, Erik Yeghyan, Olena Savych, Olga O Tarkhanova, Alexandre Varnek, Yurii S Moroz
{"title":"The freedom space - a new set of commercially available molecules for hit discovery.","authors":"Mykola V Protopopov, Valentyna V Tararina, Fanny Bonachera, Igor M Dzyuba, Anna Kapeliukha, Serhii Hlotov, Oleksii Chuk, Gilles Marcou, Olga Klimchuk, Dragos Horvath, Erik Yeghyan, Olena Savych, Olga O Tarkhanova, Alexandre Varnek, Yurii S Moroz","doi":"10.1002/minf.202400114","DOIUrl":"10.1002/minf.202400114","url":null,"abstract":"<p><p>The advent of high-performance virtual screening techniques nowadays allows drug designers to explore ultra-large sets of candidate compounds in search of molecules predicted to have desired properties. However, the success of such an endeavor heavily relies on the pertinence (drug-likeness and, foremost, chemical feasibility) of these candidates, or otherwise, virtual screening will return valueless \"hits\", by the garbage in/garbage out principle. The huge popularity of the judiciously enumerated Enamine REAL Space is clear proof of the strength of this Big Data trend in drug discovery. Here we describe a new dataset of make-on-demand compounds called the Freedom space. It follows the principles of Enamine REAL Space and contains highly feasible molecules (synthesis success rate over 75 percent). However, the scaffold and chemography analysis revealed significant differences to both the REAL and biologically annotated compounds from the ChEMBL database. The Freedom Space is a significant extension of the REAL Space and can be utilized for a more comprehensive exploration of the synthetically feasible chemical space in hit finding and hit-to-lead campaigns.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":" ","pages":"e202400114"},"PeriodicalIF":2.8,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142018020","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Molecular InformaticsPub Date : 2024-12-01Epub Date: 2024-10-15DOI: 10.1002/minf.202400037
Johann Gasteiger
{"title":"Review of the 8<sup>th</sup> autumn school in chemoinformatics.","authors":"Johann Gasteiger","doi":"10.1002/minf.202400037","DOIUrl":"10.1002/minf.202400037","url":null,"abstract":"<p><p>This paper gives an overview of the lectures and posters presented at the 8th Autumn School in Chemoinformatics held in Nara, Japan on 28th - 30th November 2023. The topics ranged from the study of chemical reactions through drug design and the use of Chemical Language Models and electronic structure informatics to the modeling of materials. In addition, a brief overview of the 50 years of work in chemoinformatics by Johann Gasteiger is given with an emphasis on the essential decisions during his scientific career.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":" ","pages":"e202400037"},"PeriodicalIF":2.8,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11639044/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142470304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Molecular InformaticsPub Date : 2024-12-01Epub Date: 2024-07-09DOI: 10.1002/minf.202400032
Sergey M Ivanov, Anastasia V Rudik, Alexey A Lagunin, Dmitry A Filimonov, Vladimir V Poroikov
{"title":"DIGEP-Pred 2.0: A web application for predicting drug-induced cell signaling and gene expression changes.","authors":"Sergey M Ivanov, Anastasia V Rudik, Alexey A Lagunin, Dmitry A Filimonov, Vladimir V Poroikov","doi":"10.1002/minf.202400032","DOIUrl":"10.1002/minf.202400032","url":null,"abstract":"<p><p>The analysis of drug-induced gene expression profiles (DIGEP) is widely used to estimate the potential therapeutic and adverse drug effects as well as the molecular mechanisms of drug action. However, the corresponding experimental data is absent for many existing drugs and drug-like compounds. To solve this problem, we created the DIGEP-Pred 2.0 web application, which allows predicting DIGEP and potential drug targets by structural formula of drug-like compounds. It is based on the combined use of structure-activity relationships (SARs) and network analysis. SAR models were created using PASS (Prediction of Activity Spectra for Substances) technology for data from the Comparative Toxicogenomics Database (CTD), the Connectivity Map (CMap) for the prediction of DIGEP, and PubChem and ChEMBL for the prediction of molecular mechanisms of action (MoA). Using only the structural formula of a compound, the user can obtain information on potential gene expression changes in several cell lines and drug targets, which are potential master regulators responsible for the observed DIGEP. The mean accuracy of prediction calculated by leave-one-out cross validation was 86.5 % for 13377 genes and 94.8 % for 2932 proteins (CTD data), and it was 97.9 % for 2170 MoAs. SAR models (mean accuracy-87.5 %) were also created for CMap data given on MCF7, PC3, and HL60 cell lines with different threshold values for the logarithm of fold changes: 0.5, 0.7, 1, 1.5, and 2. Additionally, the data on pathways (KEGG, Reactome), biological processes of Gene Ontology, and diseases (DisGeNet) enriched by the predicted genes, together with the estimation of target-master regulators based on OmniPath data, is also provided. DIGEP-Pred 2.0 web application is freely available at https://www.way2drug.com/digep-pred.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":" ","pages":"e202400032"},"PeriodicalIF":2.8,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141559261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Pathway-based prediction of the therapeutic effects and mode of action of custom-made multiherbal medicines.","authors":"Akihiro Ezoe, Yuki Shimada, Ryusuke Sawada, Akihiro Douke, Tomokazu Shibata, Makoto Kadowaki, Yoshihiro Yamanishi","doi":"10.1002/minf.202400108","DOIUrl":"10.1002/minf.202400108","url":null,"abstract":"<p><p>Multiherbal medicines are traditionally used as personalized medicines with custom combinations of crude drugs; however, the mechanisms of multiherbal medicines are unclear. In this study, we developed a novel pathway-based method to predict therapeutic effects and the mode of action of custom-made multiherbal medicines using machine learning. This method considers disease-related pathways as therapeutic targets and evaluates the comprehensive influence of constituent compounds on their potential target proteins in the disease-related pathways. Our proposed method enabled us to comprehensively predict new indications of 194 Kampo medicines for 87 diseases. Using Kampo-induced transcriptomic data, we demonstrated that Kampo constituent compounds stimulated the disease-related proteins and a customized Kampo formula enhanced the efficacy compared with an existing Kampo formula. The proposed method will be useful for discovering effective Kampo medicines and optimizing custom-made multiherbal medicines in practice.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":" ","pages":"e202400108"},"PeriodicalIF":2.8,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142470303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Molecular InformaticsPub Date : 2024-11-01Epub Date: 2024-06-05DOI: 10.1002/minf.202400060
Fernando Martínez-Urrutia, José L Medina-Franco
{"title":"BIOMX-DB: A web application for the BIOFACQUIM natural product database.","authors":"Fernando Martínez-Urrutia, José L Medina-Franco","doi":"10.1002/minf.202400060","DOIUrl":"10.1002/minf.202400060","url":null,"abstract":"<p><p>Natural product databases are an integral part of chemoinformatics and computer-aided drug design. Despite their pivotal role, a distinct scarcity of projects in Latin America, particularly in Mexico, provides accessible tools of this nature. Herein, we introduce BIOMX-DB, an open and freely accessible web-based database designed to address this gap. BIOMX-DB enhances the features of the existing Mexican natural product database, BIOFACQUIM, by incorporating advanced search, filtering, and download capabilities. The user-friendly interface of BIOMX-DB aims to provide an intuitive experience for researchers. For seamless access, BIOMX-DB is freely available at www.biomx-db.com.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":" ","pages":"e202400060"},"PeriodicalIF":2.8,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141262372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Molecular InformaticsPub Date : 2024-11-01Epub Date: 2024-10-15DOI: 10.1002/minf.202400082
Igor Baskin, Yair Ein-Eli
{"title":"Chemoinformatics for corrosion science: Data-driven modeling of corrosion inhibition by organic molecules.","authors":"Igor Baskin, Yair Ein-Eli","doi":"10.1002/minf.202400082","DOIUrl":"10.1002/minf.202400082","url":null,"abstract":"<p><p>This paper reviews the application of machine learning to the inhibition of corrosion by organic molecules. The methodologies considered include quantitative structure-property relationships (QSPR) and related data-driven approaches. The characteristic features of their key components are considered as applied to corrosion inhibition, including datasets, response properties, molecular descriptors, machine learning methods, and structure-property models. It is shown that the most important factors determining their choice and application features are: (1) the small or very small size of datasets, (2) the mechanism of corrosion inhibition associated with the adsorption of inhibitor molecules on the metal surface, and (3) multifactorial conditioning and noisiness of response property. On this basis, the application of machine learning to the inhibition of corrosion of materials based on iron, aluminum, and magnesium is considered. The main trends in the development of QSPR and related data-driven modeling of corrosion inhibition are discussed, the shortcomings and common errors are considered, and the prospects for their further development are outlined.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":" ","pages":"e202400082"},"PeriodicalIF":2.8,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142470299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Molecular InformaticsPub Date : 2024-11-01Epub Date: 2024-10-15DOI: 10.1002/minf.202400036
Johann Gasteiger
{"title":"My 50 Years with Chemoinformatics.","authors":"Johann Gasteiger","doi":"10.1002/minf.202400036","DOIUrl":"10.1002/minf.202400036","url":null,"abstract":"","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":" ","pages":"e202400036"},"PeriodicalIF":2.8,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142470302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Molecular InformaticsPub Date : 2024-10-01Epub Date: 2024-07-08DOI: 10.1002/minf.202400079
Moritz Walter, Jens M Borghardt, Lina Humbeck, Miha Skalic
{"title":"Multi-Task ADME/PK prediction at industrial scale: leveraging large and diverse experimental datasets.","authors":"Moritz Walter, Jens M Borghardt, Lina Humbeck, Miha Skalic","doi":"10.1002/minf.202400079","DOIUrl":"10.1002/minf.202400079","url":null,"abstract":"<p><p>ADME (Absorption, Distribution, Metabolism, Excretion) properties are key parameters to judge whether a drug candidate exhibits a desired pharmacokinetic (PK) profile. In this study, we tested multi-task machine learning (ML) models to predict ADME and animal PK endpoints trained on in-house data generated at Boehringer Ingelheim. Models were evaluated both at the design stage of a compound (i. e., no experimental data of test compounds available) and at testing stage when a particular assay would be conducted (i. e., experimental data of earlier conducted assays may be available). Using realistic time-splits, we found a clear benefit in performance of multi-task graph-based neural network models over single-task model, which was even stronger when experimental data of earlier assays is available. In an attempt to explain the success of multi-task models, we found that especially endpoints with the largest numbers of data points (physicochemical endpoints, clearance in microsomes) are responsible for increased predictivity in more complex ADME and PK endpoints. In summary, our study provides insight into how data for multiple ADME/PK endpoints in a pharmaceutical company can be best leveraged to optimize predictivity of ML models.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":" ","pages":"e202400079"},"PeriodicalIF":2.8,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141555197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Molecular InformaticsPub Date : 2024-10-01Epub Date: 2024-07-24DOI: 10.1002/minf.202400046
Guillaume Patient, Corentin Bedart, Naim A Khan, Nicolas Renault, Amaury Farce
{"title":"Distinct binding hotspots for natural and synthetic agonists of FFA4 from in silico approaches.","authors":"Guillaume Patient, Corentin Bedart, Naim A Khan, Nicolas Renault, Amaury Farce","doi":"10.1002/minf.202400046","DOIUrl":"10.1002/minf.202400046","url":null,"abstract":"<p><p>FFA4 has gained interest in recent years since its deorphanization in 2005 and the characterization of the Free Fatty Acids receptors family for their therapeutic potential in metabolic disorders. The expression of FFA4 (also known as GPR120) in numerous organs throughout the human body makes this receptor a highly potent target, particularly in fat sensing and diet preference. This offers an attractive approach to tackle obesity and related metabolic diseases. Recent cryo-EM structures of the receptor have provided valuable information for a potential active state although the previous studies of FFA4 presented diverging information. We performed molecular docking and molecular dynamics simulations of four agonist ligands, TUG-891, Linoleic acid, α-Linolenic acid, and Oleic acid, based on a homology model. Our simulations, which accumulated a total of 2 μs of simulation, highlighted two binding hotspots at Arg99<sup>2.64</sup> and Lys293 (ECL3). The results indicate that the residues are located in separate areas of the binding pocket and interact with various types of ligands, implying different potential active states of FFA4 and a highly adaptable binding intra-receptor pocket. This article proposes additional structural characteristics and mechanisms for agonist binding that complement the experimental structures.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":" ","pages":"e202400046"},"PeriodicalIF":2.8,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141752164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}