Sergey M Ivanov, Anastasia V Rudik, Alexey A Lagunin, Dmitry A Filimonov, Vladimir V Poroikov
{"title":"DIGEP-Pred 2.0: A web application for predicting drug-induced cell signaling and gene expression changes.","authors":"Sergey M Ivanov, Anastasia V Rudik, Alexey A Lagunin, Dmitry A Filimonov, Vladimir V Poroikov","doi":"10.1002/minf.202400032","DOIUrl":"https://doi.org/10.1002/minf.202400032","url":null,"abstract":"<p><p>The analysis of drug-induced gene expression profiles (DIGEP) is widely used to estimate the potential therapeutic and adverse drug effects as well as the molecular mechanisms of drug action. However, the corresponding experimental data is absent for many existing drugs and drug-like compounds. To solve this problem, we created the DIGEP-Pred 2.0 web application, which allows predicting DIGEP and potential drug targets by structural formula of drug-like compounds. It is based on the combined use of structure-activity relationships (SARs) and network analysis. SAR models were created using PASS (Prediction of Activity Spectra for Substances) technology for data from the Comparative Toxicogenomics Database (CTD), the Connectivity Map (CMap) for the prediction of DIGEP, and PubChem and ChEMBL for the prediction of molecular mechanisms of action (MoA). Using only the structural formula of a compound, the user can obtain information on potential gene expression changes in several cell lines and drug targets, which are potential master regulators responsible for the observed DIGEP. The mean accuracy of prediction calculated by leave-one-out cross validation was 86.5 % for 13377 genes and 94.8 % for 2932 proteins (CTD data), and it was 97.9 % for 2170 MoAs. SAR models (mean accuracy-87.5 %) were also created for CMap data given on MCF7, PC3, and HL60 cell lines with different threshold values for the logarithm of fold changes: 0.5, 0.7, 1, 1.5, and 2. Additionally, the data on pathways (KEGG, Reactome), biological processes of Gene Ontology, and diseases (DisGeNet) enriched by the predicted genes, together with the estimation of target-master regulators based on OmniPath data, is also provided. DIGEP-Pred 2.0 web application is freely available at https://www.way2drug.com/digep-pred.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":null,"pages":null},"PeriodicalIF":2.8,"publicationDate":"2024-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141559261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shrilakshmi Sheshagiri Rao, Shankar V Kundapura, Debayan Dey, Chandrasekaran Palaniappan, Kanagaraj Sekar, Ananda Kulal, Udupi A Ramagopal
{"title":"Cumulative phylogenetic, sequence and structural analysis of Insulin superfamily proteins provide unique structure-function insights.","authors":"Shrilakshmi Sheshagiri Rao, Shankar V Kundapura, Debayan Dey, Chandrasekaran Palaniappan, Kanagaraj Sekar, Ananda Kulal, Udupi A Ramagopal","doi":"10.1002/minf.202300160","DOIUrl":"https://doi.org/10.1002/minf.202300160","url":null,"abstract":"<p><p>The insulin superfamily proteins (ISPs), in particular, insulin, IGFs and relaxin proteins are key modulators of animal physiology. They are known to have evolved from the same ancestral gene and have diverged into proteins with varied sequences and distinct functions, but maintain a similar structural architecture stabilized by highly conserved disulphide bridges. The recent surge of sequence data and the structures of these proteins prompted a need for a comprehensive analysis, which connects the evolution of these sequences (427 sequences) in the light of available functional and structural information including representative complex structures of ISPs with their cognate receptors. This study reveals (a) unusually high sequence conservation of IGFs (>90 % conservation in 184 sequences) and provides a possible structure-based rationale for such high sequence conservation; (b) provides an updated definition of the receptor-binding signature motif of the functionally diverse relaxin family members (c) provides a probable non-canonical C-peptide cleavage site in a few insulin sequences. The high conservation of IGFs appears to represent a classic case of resistance to sequence diversity exerted by physiologically important interactions with multiple partners. We also propose a probable mechanism for C-peptide cleavage in a few distinct insulin sequences and redefine the receptor-binding signature motif of the relaxin family. Lastly, we provide a basis for minimally modified insulin mutants with potential therapeutic application, inspired by concomitant changes observed in other insulin superfamily protein members supported by molecular dynamics simulation.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":null,"pages":null},"PeriodicalIF":2.8,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141555195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Moritz Walter, Jens M Borghardt, Lina Humbeck, Miha Skalic
{"title":"Multi-Task ADME/PK prediction at industrial scale: leveraging large and diverse experimentaldatasets.","authors":"Moritz Walter, Jens M Borghardt, Lina Humbeck, Miha Skalic","doi":"10.1002/minf.202400079","DOIUrl":"https://doi.org/10.1002/minf.202400079","url":null,"abstract":"<p><p>ADME (Absorption, Distribution, Metabolism, Excretion) properties are key parameters to judge whether a drug candidate exhibits a desired pharmacokinetic (PK) profile. In this study, we tested multi-task machine learning (ML) models to predict ADME and animal PK endpoints trained on in-house data generated at Boehringer Ingelheim. Models were evaluated both at the design stage of a compound (i. e., no experimental data of test compounds available) and at testing stage when a particular assay would be conducted (i. e., experimental data of earlier conducted assays may be available). Using realistic time-splits, we found a clear benefit in performance of multi-task graph-based neural network models over single-task model, which was even stronger when experimental data of earlier assays is available. In an attempt to explain the success of multi-task models, we found that especially endpoints with the largest numbers of data points (physicochemical endpoints, clearance in microsomes) are responsible for increased predictivity in more complex ADME and PK endpoints. In summary, our study provides insight into how data for multiple ADME/PK endpoints in a pharmaceutical company can be best leveraged to optimize predictivity of ML models.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":null,"pages":null},"PeriodicalIF":2.8,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141555197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Chemoinformatic regression methods and their applicability domain.","authors":"Thomas-Martin Dutschmann, Valerie Schlenker, Knut Baumann","doi":"10.1002/minf.202400018","DOIUrl":"10.1002/minf.202400018","url":null,"abstract":"<p><p>The growing interest in chemoinformatic model uncertainty calls for a summary of the most widely used regression techniques and how to estimate their reliability. Regression models learn a mapping from the space of explanatory variables to the space of continuous output values. Among other limitations, the predictive performance of the model is restricted by the training data used for model fitting. Identification of unusual objects by outlier detection methods can improve model performance. Additionally, proper model evaluation necessitates defining the limitations of the model, often called the applicability domain. Comparable to certain classifiers, some regression techniques come with built-in methods or augmentations to quantify their (un)certainty, while others rely on generic procedures. The theoretical background of their working principles and how to deduce specific and general definitions for their domain of applicability shall be explained.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":null,"pages":null},"PeriodicalIF":2.8,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141158308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Molecular InformaticsPub Date : 2024-07-01Epub Date: 2024-06-10DOI: 10.1002/minf.202300339
Julia Revillo Imbernon, Jean-Marc Weibel, Eric Ennifar, Gilles Prévost, Esther Kellenberger
{"title":"Structural analysis of neomycin B and kanamycin A binding Aminoglycosides Modifying Enzymes (AME) and bacterial ribosomal RNA.","authors":"Julia Revillo Imbernon, Jean-Marc Weibel, Eric Ennifar, Gilles Prévost, Esther Kellenberger","doi":"10.1002/minf.202300339","DOIUrl":"10.1002/minf.202300339","url":null,"abstract":"<p><p>Aminoglycosides are crucial antibiotics facing challenges from bacterial resistance. This study addresses the importance of aminoglycoside modifying enzymes in the context of escalating resistance. Drawing upon over two decades of structural data in the Protein Data Bank, we focused on two key antibiotics, neomycin B and kanamycin A, to explore how the aminoglycoside structure is exploited by this family of enzymes. A systematic comparison across diverse enzymes and the RNA A-site target identified common characteristics in the recognition mode, while assessing the adaptability of neomycin B and kanamycin A in various environments.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":null,"pages":null},"PeriodicalIF":2.8,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141296441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Molecular InformaticsPub Date : 2024-07-01Epub Date: 2024-06-12DOI: 10.1002/minf.202300259
Milo Roucairol, Tristan Cazenave
{"title":"Comparing search algorithms on the retrosynthesis problem.","authors":"Milo Roucairol, Tristan Cazenave","doi":"10.1002/minf.202300259","DOIUrl":"10.1002/minf.202300259","url":null,"abstract":"<p><p>In this article we try different algorithms, namely Nested Monte Carlo Search and Greedy Best First Search, on AstraZeneca's open source retrosynthetic tool : AiZynthFinder. We compare these algorithms to AiZynthFinder's base Monte Carlo Tree Search on a benchmark selected from the PubChem database and by Bayer's chemists. We show that both Nested Monte Carlo Search and Greedy Best First Search outperform AstraZeneca's Monte Carlo Tree Search, with a slight advantage for Nested Monte Carlo Search while experimenting on a playout heuristic. We also show how the search algorithms are bounded by the quality of the policy network, in order to improve our results the next step is to improve the policy network.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":null,"pages":null},"PeriodicalIF":2.8,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141306331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Robert Fraczkiewicz, Huy Quoc Nguyen, Newton Wu, Nina Kausch‐Busies, Sergio Grimbs, Kai Sommer, Antonius ter Laak, Judith Günther, Björn Wagner, Michael Reutlinger
{"title":"Best of both worlds: An expansion of the state of the art pKa model with data from three industrial partners","authors":"Robert Fraczkiewicz, Huy Quoc Nguyen, Newton Wu, Nina Kausch‐Busies, Sergio Grimbs, Kai Sommer, Antonius ter Laak, Judith Günther, Björn Wagner, Michael Reutlinger","doi":"10.1002/minf.202400088","DOIUrl":"https://doi.org/10.1002/minf.202400088","url":null,"abstract":"In a unique collaboration between Simulations Plus and several industrial partners, we were able to develop a new version 11.0 of the previously published <jats:italic>in silico</jats:italic> pK<jats:sub>a</jats:sub> model, S+pKa, with considerably improved prediction accuracy. The model's training set was vastly expanded by large amounts of experimental data obtained from F. Hoffmann‐La Roche AG, Genentech Inc., and the Crop Science division of Bayer AG. The previous v7.0 of S+pKa was trained on data from public sources and the Pharmaceutical division of Bayer AG. The model has shown dramatic improvements in predictive accuracy when externally validated on three new contributor compound sets. Less expected was v11.0’s improvement in prediction on new compounds developed at Bayer Pharma after v7.0 was released (2013–2023), even without contributing additional data to v11.0. We illustrate chemical space coverage by chemistries encountered in the five domains, public and industrial, outline model construction, and discuss factors contributing to model's success.","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141504470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exploring drug repositioning possibilities of kinase inhibitors via molecular simulation**","authors":"Qing‐Xin Wang, Jiao Cai, Zi‐Jun Chen, Jia‐Chuan Liu, Jing‐Jing Wang, Hai Zhou, Qing‐Qing Li, Zi‐Xuan Wang, Yi‐Bo Wang, Zhen‐Jiang Tong, Jin Yang, Tian‐Hua Wei, Meng‐Yuan Zhang, Yun Zhou, Wei‐Chen Dai, Ning Ding, Xue‐Jiao Leng, Xiao‐Ying Yin, Shan‐Liang Sun, Yan‐Cheng Yu, Nian‐Guang Li, Zhi‐Hao Shi","doi":"10.1002/minf.202300336","DOIUrl":"https://doi.org/10.1002/minf.202300336","url":null,"abstract":"Kinases, a class of enzymes controlling various substrates phosphorylation, are pivotal in both physiological and pathological processes. Although their conserved ATP binding pockets pose challenges for achieving selectivity, this feature offers opportunities for drug repositioning of kinase inhibitors (KIs). This study presents a cost‐effective in silico prediction of KIs drug repositioning via analyzing cross‐docking results. We established the KIs database (278 unique KIs, 1834 bioactivity data points) and kinases database (357 kinase structures categorized by the DFG motif) for carrying out cross‐docking. Comparative analysis of the docking scores and reported experimental bioactivity revealed that the Atypical, TK, and TKL superfamilies are suitable for drug repositioning. Among these kinase superfamilies, Olverematinib, Lapatinib, and Abemaciclib displayed enzymatic activity in our focused AKT‐PI3K‐mTOR pathway with IC<jats:sub>50</jats:sub> values of 3.3, 3.2 and 5.8 μM. Further cell assays showed IC<jats:sub>50</jats:sub> values of 0.2, 1.2 and 0.6 μM in tumor cells. The consistent result between prediction and validation demonstrated that repositioning KIs via <jats:italic>in silico</jats:italic> method is feasible.","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141504468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alejandro Gómez‐García, Ann‐Kathrin Prinz, Daniel A. Acuña Jiménez, William J. Zamora, Haruna L. Barazorda‐Ccahuana, Miguel Á. Chávez‐Fumagalli, Marilia Valli, Adriano D. Andricopulo, Vanderlan da S. Bolzani, Dionisio A. Olmedo, Pablo N. Solís, Marvin J. Núñez, Johny R. Rodríguez Pérez, Hoover A. Valencia Sánchez, Héctor F. Cortés Hernández, Oscar M. Mosquera Martinez, Oliver Koch, José L. Medina‐Franco
{"title":"Updating and profiling the natural product‐likeness of Latin American compound libraries","authors":"Alejandro Gómez‐García, Ann‐Kathrin Prinz, Daniel A. Acuña Jiménez, William J. Zamora, Haruna L. Barazorda‐Ccahuana, Miguel Á. Chávez‐Fumagalli, Marilia Valli, Adriano D. Andricopulo, Vanderlan da S. Bolzani, Dionisio A. Olmedo, Pablo N. Solís, Marvin J. Núñez, Johny R. Rodríguez Pérez, Hoover A. Valencia Sánchez, Héctor F. Cortés Hernández, Oscar M. Mosquera Martinez, Oliver Koch, José L. Medina‐Franco","doi":"10.1002/minf.202400052","DOIUrl":"https://doi.org/10.1002/minf.202400052","url":null,"abstract":"Compound databases of natural products play a crucial role in drug discovery and development projects and have implications in other areas, such as food chemical research, ecology and metabolomics. Recently, we put together the first version of the Latin American Natural Product database (LANaPDB) as a collective effort of researchers from six countries to ensemble a public and representative library of natural products in a geographical region with a large biodiversity. The present work aims to conduct a comparative and extensive profiling of the natural product‐likeness of an updated version of LANaPDB and the individual ten compound databases that form part of LANaPDB. The natural product‐likeness profile of the Latin American compound databases is contrasted with the profile of other major natural product databases in the public domain and a set of small‐molecule drugs approved for clinical use. As part of the extensive characterization, we employed several chemoinformatics metrics of natural product likeness. The results of this study will capture the attention of the global community engaged in natural product databases, not only in Latin America but across the world.","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141504469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}