{"title":"Pharmacological profiles of neglected tropical disease drugs","authors":"","doi":"10.1016/j.ailsci.2024.100116","DOIUrl":"10.1016/j.ailsci.2024.100116","url":null,"abstract":"<div><div>According to the World health Organization there are a group of 20 diverse infectious Neglected Tropical Disease (NTD) conditions that primarily affect populations in low-income and developing regions. Despite the limited attention and funding compared to other health concerns, significant efforts to develop drugs for treating and controlling NTDs have been made. However, there is room for developing NTD drugs with improved safety, efficacy and ecotoxicological profiles. In order to facilitate this, we have adapted our existing validated data-driven workflows for understanding disease comorbidity to systematically evaluate the approved drugs that target the major World Health Organization defined NTDs. The foundation for this work comprised assembling the physicochemical, biological and clinical properties of each NTD drug and identifying patterns that reveal the underlying cause of their efficacy and side-effect profiles. Subsequently, computational methods were employed to identify analogs with potentially improved profiles and validated in a case study focusing on the teratogenic antileishmanial drug miltefosine. The wider impact of NTD drugs with regards to a One Health cross-disciplinary perspective at the human-animal-environment interface are also discussed.</div></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142586762","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DTA Atlas: A massive-scale drug repurposing database","authors":"","doi":"10.1016/j.ailsci.2024.100115","DOIUrl":"10.1016/j.ailsci.2024.100115","url":null,"abstract":"<div><div>The drug development process is costly and time-consuming. Repurposing existing approved drugs, an efficient and cost-effective strategy, involves assessing numerous drug-protein pairs to uncover new interactions. While modern <em>in silico</em> methods enhance scalability, an open database for projected drug-target interactions across the entire human proteome is still lacking. In this work, we introduce an open database of predicted drug-target interactions, termed <em>DTA Atlas</em>, covering the entire human proteome as well as a wide range of marketed drugs, resulting in over 220 million drug-target pairs. The database integrates 4 billion affinity predictions from advanced deep neural networks and offers a user-friendly web interface, enabling users to explore drug-target affinity predictions for the human proteome. To the best of our knowledge, DTA Atlas represents the first comprehensive collection of drug-target binding strength predictions. It is open-source and can serve as an important resource for drug development, drug repurposing, toxicity studies and more.</div></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142525925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Modeling PROTAC degradation activity with machine learning","authors":"","doi":"10.1016/j.ailsci.2024.100104","DOIUrl":"10.1016/j.ailsci.2024.100104","url":null,"abstract":"<div><p>PROTACs are a promising therapeutic modality that harnesses the cell’s built-in degradation machinery to degrade specific proteins. Despite their potential, developing new PROTACs is challenging and requires significant domain expertise, time, and cost. Meanwhile, machine learning has transformed drug design and development. In this work, we present a strategy for curating open-source PROTAC data and an open-source deep learning tool for predicting the degradation activity of novel PROTAC molecules. The curated dataset incorporates important information such as <span><math><mrow><mi>p</mi><mi>D</mi><msub><mrow><mi>C</mi></mrow><mrow><mn>50</mn></mrow></msub></mrow></math></span>, <span><math><msub><mrow><mi>D</mi></mrow><mrow><mi>m</mi><mi>a</mi><mi>x</mi></mrow></msub></math></span>, E3 ligase type, POI amino acid sequence, and experimental cell type. Our model architecture leverages learned embeddings from pretrained machine learning models, in particular for encoding protein sequences and cell type information. We assessed the quality of the curated data and the generalization ability of our model architecture against new PROTACs and targets via three tailored studies, which we recommend other researchers to use in evaluating their degradation activity models. In each study, three models predict protein degradation in a majority vote setting, reaching a top test accuracy of 82.6% and 0.848 ROC AUC, and a test accuracy of 61% and 0.615 ROC AUC when generalizing to novel protein targets. Our results are not only comparable to state-of-the-art models for protein degradation prediction, but also part of an open-source implementation which is easily reproducible and less computationally complex than existing approaches.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667318524000114/pdfft?md5=fbcd6191bbd4f65eeacdd8602953af66&pid=1-s2.0-S2667318524000114-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141960711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Francis J. Prael III , Jiayi Cox , Noé Sturm , Peter Kutchukian , William C. Forrester , Gregory Michaud , Jutta Blank , Lingling Shen , Raquel Rodríguez-Pérez
{"title":"Machine learning proteochemometric models for Cereblon glue activity predictions","authors":"Francis J. Prael III , Jiayi Cox , Noé Sturm , Peter Kutchukian , William C. Forrester , Gregory Michaud , Jutta Blank , Lingling Shen , Raquel Rodríguez-Pérez","doi":"10.1016/j.ailsci.2024.100100","DOIUrl":"https://doi.org/10.1016/j.ailsci.2024.100100","url":null,"abstract":"<div><p>Targeted protein degradation (TPD) is a rapidly developing drug discovery technique with unique efficacy and target scope stemming from its degradation-based activity. Molecular glue degraders are a promising arm of TPD, as evidenced by the FDA-approved therapeutics within this class, the increasing number of degraders in clinical development, and their predisposition to drug-likeness. Cereblon (CRBN) glue degraders mediate target degradation by generating a neomorphic interface between CRBN and a protein of interest. While promising, the complicated nature of this CRBN-glue-target ternary complex makes the rational design of molecular glue degraders challenging. For other drug modalities, predictive modeling has been established to leverage existing activity data and generate quantitative structure-activity relationships (QSAR). However, the applicability of QSAR strategies for glues remains under-investigated. Herein, machine learning methodologies were developed to predict glue-mediated recruitment of CRBN to target proteins and achieved promising performance. Generated models leveraged more than a hundred internal screening campaigns across thousands of CRBN glues to predict glue-mediated recruitment of targets to CRBN. Our results show that recruitment activity of CRBN glue degraders can be modeled by machine learning, with 89 % of models producing an area under the receiver operating characteristic curve (ROC AUC) > 0.8 and 70 % of models producing a Matthew's correlation coefficient (MCC) > 0.2 for these primary screening data. Importantly, our findings also indicate that the combination of compound and protein descriptors in the so-called proteochemometric models improves performance, with >80 % of the models exhibiting higher ROC AUC and MCC values than per-target models only based on compound information. Hence, our investigations suggest that proteochemometric modeling is a successful approach for molecular glue degraders. The proposed machine learning strategies can aid compound prioritization based on recruitment efficacy and target selectivity, thus have the potential to facilitate the design and discovery of therapeutic CRBN molecular glues.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667318524000072/pdfft?md5=74a4c064cfb576ff403180c61ffdc97f&pid=1-s2.0-S2667318524000072-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141324462","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Statistical approaches enabling technology-specific assay interference prediction from large screening data sets","authors":"Vincenzo Palmacci , Steffen Hirte , Jorge Enrique Hernández González , Floriane Montanari , Johannes Kirchmair","doi":"10.1016/j.ailsci.2024.100099","DOIUrl":"https://doi.org/10.1016/j.ailsci.2024.100099","url":null,"abstract":"<div><p>High throughput screening (HTS) technologies allow the biological testing of hundreds of thousands of compounds per day. Typically, a substantial proportion of the initial hits obtained by HTS are artifacts caused by assay interference. Therefore, global and technology-specific in silico models for identifying and predicting compounds interfering with biological assays have been developed. The global models benefit from training on large screening data sets, while the specialized models benefit from training on assay technology-specific experimental data. In this work, we develop and explore strategies for generating better predictors of technology-specific assay interference by utilizing the large bioactivity data matrices global models are trained on and employing partially new compound labeling approaches to maintain the assay technology awareness of specialized models. We demonstrate the utility of the statistically derived interference labels in machine learning using fluorescence-based assay interference as a representative example. Our random forest and multi-layer perceptron classifiers showed improved performance compared to existing models, achieving Matthews correlation coefficients (MCCs) of up to 0.47 on holdout data and up to 0.45 on an external test set. These results demonstrate that accurate assay-specific interference labels can be derived from large bioactivity data matrices, enabling the development of new machine-learning models without the need for further experimental data.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667318524000060/pdfft?md5=b99d896dcc34d54ad38a7b8ccb52ebda&pid=1-s2.0-S2667318524000060-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141289445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Federated learning for predicting compound mechanism of action based on image-data from cell painting","authors":"Li Ju , Andreas Hellander , Ola Spjuth","doi":"10.1016/j.ailsci.2024.100098","DOIUrl":"https://doi.org/10.1016/j.ailsci.2024.100098","url":null,"abstract":"<div><p>Having access to sufficient data is essential in order to train accurate machine learning models, but much data is not publicly available. In drug discovery this is particularly evident, as much data is withheld at pharmaceutical companies for various reasons. Federated Learning (FL) aims at training a joint model between multiple parties but without disclosing data between the parties. In this work, we leverage Federated Learning to predict compound Mechanism of Action (MoA) using fluorescence image data from cell painting. Our study evaluates the effectiveness and efficiency of FL, comparing to non-collaborative and data-sharing collaborative learning in diverse scenarios. Specifically, we investigate the impact of data heterogeneity across participants on MoA prediction, an essential concern in real-life applications of FL, and demonstrate the benefits for all involved parties. This work highlights the potential of federated learning in multi-institutional collaborative machine learning for drug discovery and assessment of chemicals, offering a promising avenue to overcome data-sharing constraints.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667318524000059/pdfft?md5=100e1ed9ac27f95816db906647d11bc0&pid=1-s2.0-S2667318524000059-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140951069","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An integrated approach to predict activators of NRF2 - the transcription factor for oxidative stress response","authors":"Yaroslav Chushak , Rebecca A. Clewell","doi":"10.1016/j.ailsci.2024.100097","DOIUrl":"https://doi.org/10.1016/j.ailsci.2024.100097","url":null,"abstract":"<div><p>A variety of environmental and physiological conditions can cause oxidative stress that damage cellular components such as DNA, proteins and lipids. Oxidative stress is implicated in many human diseases including cancer, cardiovascular diseases, neurological diseases, inflammatory diseases, and aging. The nuclear factor erythroid 2–related factor 2 (NRF2) is a transcriptional factor that plays a key role in the cellular antioxidant defense system as it regulates transcription of antioxidant proteins and detoxifying enzymes. There is an urgent need to identify novel compounds that activate NRF2 and enhance antioxidant defense. We collected data from the high-throughput screening of NRF2 activators and identified molecular fragments (structural alerts) associated with the activation of NRF2. We also developed ten classification models using different types of molecular descriptors and machine learning techniques. Two approaches were used to establish the applicability domain of developed models: the structure-based approach and the distance to model approach. The best performing model that used message passing neural network (MPNN) technique showed accuracy of 87 % for the test set of chemicals within the distance to model of 0.3. The integrative approach using a combination of generated structural alerts and MPNN model was used to screen approved drugs collected in the DrugBank to identify potential NRF2 activators. Out of 2393 screened chemicals 138 compounds were predicted as NRF2 activators by both approaches. Analysis of these compounds showed that some drugs were already known activators of NRF2 while others are potentially novel activators.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667318524000047/pdfft?md5=29a2ee24a6813324417f266b95b1e48d&pid=1-s2.0-S2667318524000047-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140606623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Artificial intelligence-open science symbiosis in chemoinformatics","authors":"Filip Miljković , José L. Medina-Franco","doi":"10.1016/j.ailsci.2024.100096","DOIUrl":"10.1016/j.ailsci.2024.100096","url":null,"abstract":"<div><p>In chemoinformatics, artificial intelligence (AI) continues to grow a symbiosis with open science (OS). Such a close AI-OS interaction brings substantial practical benefits in research, scientific dissemination, and education, to name a few areas. The AI-OS symbiosis can be further enhanced by combining sufficient substantive expertise, mathematical and statistical knowledge, and coding skills. This Viewpoint discusses the benefits of the smooth and productive interaction between AI, OS, and open data. We also present a short list of misconceptions and pitfalls surrounding AI-OS and propose correct responses and behaviors agreed upon by field experts. In addition, we provide suggestions to continue enhancing the positive contributions of the AI-OS symbiosis towards chemoinformatics.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667318524000035/pdfft?md5=15b234d142847a979a68f7886068152e&pid=1-s2.0-S2667318524000035-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140276452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Negin Sadat Babaiha , Sathvik Guru Rao , Jürgen Klein , Bruce Schultz , Marc Jacobs , Martin Hofmann-Apitius
{"title":"Rationalism in the face of GPT hypes: Benchmarking the output of large language models against human expert-curated biomedical knowledge graphs","authors":"Negin Sadat Babaiha , Sathvik Guru Rao , Jürgen Klein , Bruce Schultz , Marc Jacobs , Martin Hofmann-Apitius","doi":"10.1016/j.ailsci.2024.100095","DOIUrl":"https://doi.org/10.1016/j.ailsci.2024.100095","url":null,"abstract":"<div><p>Biomedical knowledge graphs (KGs) hold valuable information regarding biomedical entities such as genes, diseases, biological processes, and drugs. KGs have been successfully employed in challenging biomedical areas such as the identification of pathophysiology mechanisms or drug repurposing. The creation of high-quality KGs typically requires labor-intensive multi-database integration or substantial human expert curation, both of which take time and contribute to the workload of data processing and annotation. Therefore, the use of automatic systems for KG building and maintenance is a prerequisite for the wide uptake and utilization of KGs. Technologies supporting the automated generation and updating of KGs typically make use of Natural Language Processing (NLP), which is optimized for extracting implicit triples described in relevant biomedical text sources. At the core of this challenge is how to improve the accuracy and coverage of the information extraction module by utilizing different models and tools. The emergence of pre-trained large language models (LLMs), such as ChatGPT which has grown in popularity dramatically, has revolutionized the field of NLP, making them a potential candidate to be used in text-based graph creation as well. So far, no previous work has investigated the power of LLMs on the generation of cause-and-effect networks and KGs encoded in Biological Expression Language (BEL). In this paper, we present initial studies towards one-shot BEL relation extraction using two different versions of the Generative Pre-trained Transformer (GPT) models and evaluate its performance by comparing the extracted results to a highly accurate, manually curated BEL KG curated by domain experts.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667318524000023/pdfft?md5=9137dd2a207653e4d13cb5b99ca17d48&pid=1-s2.0-S2667318524000023-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139710160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Origins and progression of the polypharmacology concept in drug discovery","authors":"Jürgen Bajorath","doi":"10.1016/j.ailsci.2024.100094","DOIUrl":"https://doi.org/10.1016/j.ailsci.2024.100094","url":null,"abstract":"","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667318524000011/pdfft?md5=ef2f5411ede3a24f3429765640c3360c&pid=1-s2.0-S2667318524000011-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139107191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}