Molecular InformaticsPub Date : 2024-05-01Epub Date: 2024-03-05DOI: 10.1002/minf.202300287
Nejra Granulo, Sergey Sosnin, Daniela Digles, Gerhard F Ecker
{"title":"The macrocycle inhibitor landscape of SLC-transporter.","authors":"Nejra Granulo, Sergey Sosnin, Daniela Digles, Gerhard F Ecker","doi":"10.1002/minf.202300287","DOIUrl":"10.1002/minf.202300287","url":null,"abstract":"<p><p>In the past years the interest in Solute Carrier Transporters (SLC) has increased due to their potential as drug targets. At the same time, macrocycles demonstrated promising activities as therapeutic agents. However, the overall macrocycle/SLC-transporter interaction landscape has not been fully revealed yet. In this study, we present a statistical analysis of macrocycles with measured activity against SLC-transporter. Using a data mining pipeline based on KNIME retrieved in total 825 bioactivity data points of macrocycles interacting with SLC-transporter. For further analysis of the SLC inhibitor profiles we developed an interactive KNIME workflow as well as an interactive map of the chemical space coverage utilizing parametric t-SNE models. The parametric t-SNE models provide a good discrimination ability among several corresponding SLC subfamilies' targets. The KNIME workflow, the dataset, and the visualization tool are freely available to the community.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":" ","pages":"e202300287"},"PeriodicalIF":2.8,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11475418/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139576130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Predicting S. aureus antimicrobial resistance with interpretable genomic space maps.","authors":"Karina Pikalyova, Alexey Orlov, Dragos Horvath, Gilles Marcou, Alexandre Varnek","doi":"10.1002/minf.202300263","DOIUrl":"10.1002/minf.202300263","url":null,"abstract":"<p><p>Increasing antimicrobial resistance (AMR) represents a global healthcare threat. To decrease the spread of AMR and associated mortality, methods for rapid selection of optimal antibiotic treatment are urgently needed. Machine learning (ML) models based on genomic data to predict resistant phenotypes can serve as a fast screening tool prior to phenotypic testing. Nonetheless, many existing ML methods lack interpretability. Therefore, we present a methodology for visualization of sequence space and AMR prediction based on the non-linear dimensionality reduction method - generative topographic mapping (GTM). This approach, applied to AMR data of >5000 S. aureus isolates retrieved from the PATRIC database, yielded GTM models with reasonable accuracy for all drugs (balanced accuracy values ≥0.75). The Generative Topographic Maps (GTMs) represent data in the form of illustrative maps of the genomic space and allow for antibiotic-wise comparison of resistant phenotypes. The maps were also found to be useful for the analysis of genetic determinants responsible for drug resistance. Overall, the GTM-based methodology is a useful tool for both the illustrative exploration of the genomic sequence space and AMR prediction.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":" ","pages":"e202300263"},"PeriodicalIF":3.6,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139932061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mirjana Antonijevic, Jana Sopkova‐de Oliveira Santos, Patrick Dallemagne, Christophe Rochais
{"title":"Discovery of a pocket network on the domain 5 of the TrkB receptor – A potential new target in the quest for the new ligands","authors":"Mirjana Antonijevic, Jana Sopkova‐de Oliveira Santos, Patrick Dallemagne, Christophe Rochais","doi":"10.1002/minf.202400043","DOIUrl":"https://doi.org/10.1002/minf.202400043","url":null,"abstract":"The important role that the neurotrophin tyrosine kinase receptor ‐ TrkB has in the pathogenesis of several neurodegenerative conditions such are Alzheimer's disease, Parkinson's disease, Huntington's disease, has been well described. This shouldn't be a surprise, since in the physiological conditions, once activated by brain‐derived neurotrophic factor (BDNF) and neurotrophin‐4/5 (NT‐4/5), the TrkB receptor promotes neuronal survival, differentiation and synaptic function. Considering that the natural ligands for TrkB receptor are large proteins, it is a challenge to discover small molecule capable to mimic their effects.Even though, the surface of receptor that is interacting with BDNF or NT‐4/5 is known, there was always a question which pocket and interaction is responsible for activation of it. In order to answer this challenging question, we have used molecular dynamic (MD) simulations and Pocketron algorithm which enabled us to detect, for the first time, a pocket network existing in the interacting domain (d5) of the receptor; to describe them and to see how they are communicating with each other. This new discovery gave us potential new areas on receptor that can be targeted and used for structure‐based drug design approach in the development of the new ligands.","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":"17 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140575770","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Molecular InformaticsPub Date : 2024-04-01Epub Date: 2024-02-06DOI: 10.1002/minf.202300183
Gian Marco Ghiandoni, Stuart R Flanagan, Michael J Bodkin, Maria Giulia Nizi, Albert Galera-Prat, Annalaura Brai, Beining Chen, James E A Wallace, Dimitar Hristozov, James Webster, Giuseppe Manfroni, Lari Lehtiö, Oriana Tabarrini, Valerie J Gillet
{"title":"Synthetically accessible de novo design using reaction vectors: Application to PARP1 inhibitors.","authors":"Gian Marco Ghiandoni, Stuart R Flanagan, Michael J Bodkin, Maria Giulia Nizi, Albert Galera-Prat, Annalaura Brai, Beining Chen, James E A Wallace, Dimitar Hristozov, James Webster, Giuseppe Manfroni, Lari Lehtiö, Oriana Tabarrini, Valerie J Gillet","doi":"10.1002/minf.202300183","DOIUrl":"10.1002/minf.202300183","url":null,"abstract":"<p><p>De novo design has been a hotly pursued topic for many years. Most recent developments have involved the use of deep learning methods for generative molecular design. Despite increasing levels of algorithmic sophistication, the design of molecules that are synthetically accessible remains a major challenge. Reaction-based de novo design takes a conceptually simpler approach and aims to address synthesisability directly by mimicking synthetic chemistry and driving structural transformations by known reactions that are applied in a stepwise manner. However, the use of a small number of hand-coded transformations restricts the chemical space that can be accessed and there are few examples in the literature where molecules and their synthetic routes have been designed and executed successfully. Here we describe the application of reaction-based de novo design to the design of synthetically accessible and biologically active compounds as proof-of-concept of our reaction vector-based software. Reaction vectors are derived automatically from known reactions and allow access to a wide region of synthetically accessible chemical space. The design was aimed at producing molecules that are active against PARP1 and which have improved brain penetration properties compared to existing PARP1 inhibitors. We synthesised a selection of the designed molecules according to the provided synthetic routes and tested them experimentally. The results demonstrate that reaction vectors can be applied to the design of novel molecules of biological relevance that are also synthetically accessible.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":" ","pages":"e202300183"},"PeriodicalIF":2.8,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11475289/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139521506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automatic generation of functional peptides with desired bioactivity and membrane permeability using Bayesian optimization.","authors":"Itsuki Fukunaga, Yuki Matsukiyo, Kazuma Kaitoh, Yoshihiro Yamanishi","doi":"10.1002/minf.202300148","DOIUrl":"10.1002/minf.202300148","url":null,"abstract":"<p><p>Peptides are potentially useful modalities of drugs; however, cell membrane permeability is an obstacle in peptide drug discovery. The identification of bioactive peptides for a therapeutic target is also challenging because of the huge amino acid sequence patterns of peptides. In this study, we propose a novel computational method, PEptide generation system using Neural network Trained on Amino acid sequence data and Gaussian process-based optimizatiON (PENTAGON), to automatically generate new peptides with desired bioactivity and cell membrane permeability. In the algorithm, we mapped peptide amino acid sequences onto the latent space constructed using a variational autoencoder and searched for peptides with desired bioactivity and cell membrane permeability using Bayesian optimization. We used our proposed method to generate peptides with cell membrane permeability and bioactivity for each of the nine therapeutic targets, such as the estrogen receptor (ER). Our proposed method outperformed a previously developed peptide generator in terms of similarity to known active peptide sequences and the length of generated peptide sequences.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":" ","pages":"e202300148"},"PeriodicalIF":3.6,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139106312","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Molecular InformaticsPub Date : 2024-04-01Epub Date: 2024-02-15DOI: 10.1002/minf.202300292
Milad Rayka, Morteza Mirzaei, Ali Mohammad Latifi
{"title":"An ensemble-based approach to estimate confidence of predicted protein-ligand binding affinity values.","authors":"Milad Rayka, Morteza Mirzaei, Ali Mohammad Latifi","doi":"10.1002/minf.202300292","DOIUrl":"10.1002/minf.202300292","url":null,"abstract":"<p><p>When designing a machine learning-based scoring function, we access a limited number of protein-ligand complexes with experimentally determined binding affinity values, representing only a fraction of all possible protein-ligand complexes. Consequently, it is crucial to report a measure of confidence and quantify the uncertainty in the model's predictions during test time. Here, we adopt the conformal prediction technique to evaluate the confidence of a prediction for each member of the core set of the CASF 2016 benchmark. The conformal prediction technique requires a diverse ensemble of predictors for uncertainty estimation. To this end, we introduce ENS-Score as an ensemble predictor, which includes 30 models with different protein-ligand representation approaches and achieves Pearson's correlation of 0.842 on the core set of the CASF 2016 benchmark. Also, we comprehensively investigate the residual error of each data point to assess the normality behavior of the distribution of the residual errors and their correlation to the structural features of the ligands, such as hydrophobic interactions and halogen bonding. In the end, we provide a local host web application to facilitate the usage of ENS-Score. All codes to repeat results are provided at https://github.com/miladrayka/ENS_Score.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":" ","pages":"e202300292"},"PeriodicalIF":3.6,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139735655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Molecular InformaticsPub Date : 2024-04-01Epub Date: 2024-02-19DOI: 10.1002/minf.202300210
Souvik Pore, Arkaprava Banerjee, Kunal Roy
{"title":"Application of machine learning-based read-across structure-property relationship (RASPR) as a new tool for predictive modelling: Prediction of power conversion efficiency (PCE) for selected classes of organic dyes in dye-sensitized solar cells (DSSCs).","authors":"Souvik Pore, Arkaprava Banerjee, Kunal Roy","doi":"10.1002/minf.202300210","DOIUrl":"10.1002/minf.202300210","url":null,"abstract":"<p><p>The application of various in-silico-based approaches for the prediction of various properties of materials has been an effective alternative to experimental methods. Recently, the concepts of Quantitative structure-property relationship (QSPR) and read-across (RA) methods were merged to develop a new emerging chemoinformatic tool: read-across structure-property relationship (RASPR). The RASPR method can be applicable to both large and small datasets as it uses various similarity and error-based measures. It has also been observed that RASPR models tend to have an increased external predictivity compared to the corresponding QSPR models. In this study, we have modeled the power conversion efficiency (PCE) of organic dyes used in dye-sensitized solar cells (DSSCs) by using the quantitative RASPR (q-RASPR) method. We have used relatively larger classes of organic dyes-Phenothiazines (n=207), Porphyrins (n=281), and Triphenylamines (n=229) for the modelling purpose. We have divided each of the datasets into training and test sets in 3 different combinations, and with the training sets we have developed three different QSPR models with structural and physicochemical descriptors and validated them with the corresponding test sets. These corresponding modeled descriptors were used to calculate the RASPR descriptors using a Java-based tool RASAR Descriptor Calculator v2.0 (https://sites.google.com/jadavpuruniversity.in/dtc-lab-software/home), and then data fusion was performed by pooling the previously selected structural and physicochemical descriptors with the calculated RASPR descriptors. Further feature selection algorithm was employed to develop the final RASPR PLS models. Here, we also developed different machine learning (ML) models with the descriptors selected in the QSPR PLS and RASPR PLS models, and it was found that models with RASPR descriptors superseded in external predictivity the models with only structural and physicochemical descriptors: RMSEP reduced for phenothiazines from 1.16-1.25 to 1.07-1.18, for porphyrins from 1.60-1.79 to 1.45-1.53, for triphenylamines from 1.27-1.54 to 1.20-1.47.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":" ","pages":"e202300210"},"PeriodicalIF":3.6,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139906082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Molecular InformaticsPub Date : 2024-03-01Epub Date: 2024-01-23DOI: 10.1002/minf.202300249
Asu Busra Temizer, Gökçe Uludoğan, Rıza Özçelik, Taha Koulani, Elif Ozkirimli, Kutlu O Ulgen, Nilgun Karali, Arzucan Özgür
{"title":"Exploring data-driven chemical SMILES tokenization approaches to identify key protein-ligand binding moieties.","authors":"Asu Busra Temizer, Gökçe Uludoğan, Rıza Özçelik, Taha Koulani, Elif Ozkirimli, Kutlu O Ulgen, Nilgun Karali, Arzucan Özgür","doi":"10.1002/minf.202300249","DOIUrl":"10.1002/minf.202300249","url":null,"abstract":"<p><p>Machine learning models have found numerous successful applications in computational drug discovery. A large body of these models represents molecules as sequences since molecular sequences are easily available, simple, and informative. The sequence-based models often segment molecular sequences into pieces called chemical words, analogous to the words that make up sentences in human languages, and then apply advanced natural language processing techniques for tasks such as de novo drug design, property prediction, and binding affinity prediction. However, the chemical characteristics and significance of these building blocks, chemical words, remain unexplored. To address this gap, we employ data-driven SMILES tokenization techniques such as Byte Pair Encoding, WordPiece, and Unigram to identify chemical words and compare the resulting vocabularies. To understand the chemical significance of these words, we build a language-inspired pipeline that treats high affinity ligands of protein targets as documents and selects key chemical words making up those ligands based on tf-idf weighting. The experiments on multiple protein-ligand affinity datasets show that despite differences in words, lengths, and validity among the vocabularies generated by different subword tokenization algorithms, the identified key chemical words exhibit similarity. Further, we conduct case studies on a number of target to analyze the impact of key chemical words on binding. We find that these key chemical words are specific to protein targets and correspond to known pharmacophores and functional groups. Our approach elucidates chemical properties of the words identified by machine learning models and can be used in drug discovery studies to determine significant chemical moieties.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":" ","pages":"e202300249"},"PeriodicalIF":3.6,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139403684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}