{"title":"Deep learning of protein–ligand interactions—Remembering the actors","authors":"Jürgen Bajorath","doi":"10.1016/j.ailsci.2022.100037","DOIUrl":"10.1016/j.ailsci.2022.100037","url":null,"abstract":"","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667318522000083/pdfft?md5=63e8dc2f154d93e6ede44a89727be89e&pid=1-s2.0-S2667318522000083-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44934025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Understanding uncertainty in deep learning builds confidence","authors":"Jürgen Bajorath","doi":"10.1016/j.ailsci.2022.100033","DOIUrl":"10.1016/j.ailsci.2022.100033","url":null,"abstract":"","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667318522000046/pdfft?md5=b881f0e2a53af340f6a1b73b950b6d6f&pid=1-s2.0-S2667318522000046-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44028547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Andrés Martínez Mora , Mickael Mogemark , Vigneshwari Subramanian , Filip Miljković
{"title":"Interpretation of multi-task clearance models from molecular images supported by experimental design","authors":"Andrés Martínez Mora , Mickael Mogemark , Vigneshwari Subramanian , Filip Miljković","doi":"10.1016/j.ailsci.2022.100048","DOIUrl":"10.1016/j.ailsci.2022.100048","url":null,"abstract":"<div><p>Recent methodological advances in deep learning (DL) architectures have not only improved the performance of predictive models but also enhanced their interpretability potential, thus considerably increasing their transparency. In the context of medicinal chemistry, the potential to not only accurately predict molecular properties, but also chemically interpret them, would be strongly preferred. Previously, we developed accurate multi-task convolutional neural network (CNN) and graph convolutional neural network (GCNN) models to predict a set of diverse intrinsic metabolic clearance parameters from image- and graph-based molecular representations, respectively. Herein, we introduce several model interpretability frameworks to answer whether the model explanations obtained from CNN and GCNN multi-task clearance models could be applied to predict chemical transformations associated with experimentally confirmed metabolic products. We show a strong correlation between the CNN pixel intensities and corresponding clearance predictions, as well as their robustness to different molecular orientations. Using actual case examples, we demonstrate that both CNN and GCNN interpretations frequently complement each other, suggesting their high potential for combined use in guiding medicinal chemistry design.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667318522000186/pdfft?md5=fc7537dd4777fa93dd0a74d1d81c0c55&pid=1-s2.0-S2667318522000186-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41622538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Stephen Bonner , Ian P. Barrett , Cheng Ye , Rowan Swiers , Ola Engkvist , Charles Tapley Hoyt , William L. Hamilton
{"title":"Understanding the performance of knowledge graph embeddings in drug discovery","authors":"Stephen Bonner , Ian P. Barrett , Cheng Ye , Rowan Swiers , Ola Engkvist , Charles Tapley Hoyt , William L. Hamilton","doi":"10.1016/j.ailsci.2022.100036","DOIUrl":"https://doi.org/10.1016/j.ailsci.2022.100036","url":null,"abstract":"<div><p>Knowledge Graphs (KG) and associated Knowledge Graph Embedding (KGE) models have recently begun to be explored in the context of drug discovery and have the potential to assist in key challenges such as target identification. In the drug discovery domain, KGs can be employed as part of a process which can result in lab-based experiments being performed, or impact on other decisions, incurring significant time and financial costs and most importantly, ultimately influencing patient healthcare. For KGE models to have impact in this domain, a better understanding of not only of performance, but also the various factors which determine it, is required.</p><p>In this study we investigate, over the course of many thousands of experiments, the predictive performance of five KGE models on two public drug discovery-oriented KGs. Our goal is not to focus on the best overall model or configuration, instead we take a deeper look at how performance can be affected by changes in the training setup, choice of hyperparameters, model parameter initialisation seed and different splits of the datasets. Our results highlight that these factors have significant impact on performance and can even affect the ranking of models. Indeed these factors should be reported along with model architectures to ensure complete reproducibility and fair comparisons of future work, and we argue this is critical for the acceptance of use, and impact of KGEs in a biomedical setting.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667318522000071/pdfft?md5=06ed4e6a1e3c501ecb6c465108f88691&pid=1-s2.0-S2667318522000071-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91728647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Satvik Tripathi , Alisha Isabelle Augustin , Rithvik Sukumaran , Suhani Dheer , Edward Kim
{"title":"HematoNet: Expert level classification of bone marrow cytology morphology in hematological malignancy with deep learning","authors":"Satvik Tripathi , Alisha Isabelle Augustin , Rithvik Sukumaran , Suhani Dheer , Edward Kim","doi":"10.1016/j.ailsci.2022.100043","DOIUrl":"https://doi.org/10.1016/j.ailsci.2022.100043","url":null,"abstract":"<div><p>There have been few efforts made to automate the cytomorphological categorization of bone marrow cells. For bone marrow cell categorization, deep-learning algorithms have been limited to a small number of samples or disease classifications. In this paper, we proposed a pipeline to classify the bone marrow cells despite these limitations. Data augmentation was used throughout the data to resolve any class imbalances. Then, random transformations such as rotating between 0<span><math><msup><mrow></mrow><mo>∘</mo></msup></math></span> to 90<span><math><msup><mrow></mrow><mo>∘</mo></msup></math></span>, zooming in/out, flipping horizontally and/or vertically, and translating were performed. The model used in the pipeline was a CoAtNet and that was compared with two baseline models, EfficientNetV2 and ResNext50. We then analyzed the CoAtNet model using SmoothGrad and Grad-CAM, two recently developed algorithms that have been shown to meet the fundamental requirements for explainability methods. After evaluating all three models’ performance for each of the distinct morphological classes, the proposed CoAtNet model was able to outperform the EfficientNetV2 and ResNext50 models due to its attention network property that increased the learning curve for the algorithm which was represented using a precision-recall curve.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667318522000137/pdfft?md5=ae12125aef4855e7cfd36f2c405d139f&pid=1-s2.0-S2667318522000137-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91728650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Iterative DeepSARM modeling for compound optimization","authors":"Atsushi Yoshimori , Jürgen Bajorath","doi":"10.1016/j.ailsci.2021.100015","DOIUrl":"10.1016/j.ailsci.2021.100015","url":null,"abstract":"<div><p>The Structure-Activity Relationship (SAR) Matrix (SARM) method systematically extracts structurally related compound series from any source and organizes these series in a unique data structure formed by matrices similar to R-group tables from medicinal chemistry. In addition, the SARM method generates virtual analogues for structurally organized series that consist of new combinations of existing core structures and R-groups. For active compounds, SARMs visualize SAR patterns and aid in compound design. The SARM methodology and data structure was integrated with a recurrent neural network architecture to further expand the compound design capacity with deep generative models, leading to the DeepSARM approach. Herein, we present an extension of the DeepSARM framework for compound optimization termed iterative DeepSARM (iDeepSARM), which involves multiple iterations of deep generative modeling and fine-tuning to obtain increasingly likely active compounds for targets of interest. Hence, iDeepSARM adds computational hit-to-lead and lead optimization capability to the DeepSARM framework. In addition to detailing methodological features and calculation protocols, an exemplary compound design application is reported to illustrate the iDeepSARM approach.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667318521000155/pdfft?md5=64c96435c7527c83c4f92d37a0c7edc8&pid=1-s2.0-S2667318521000155-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41611840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dealing with a data-limited regime: Combining transfer learning and transformer attention mechanism to increase aqueous solubility prediction performance","authors":"Magdalena Wiercioch , Johannes Kirchmair","doi":"10.1016/j.ailsci.2021.100021","DOIUrl":"10.1016/j.ailsci.2021.100021","url":null,"abstract":"<div><p>Aqueous solubility is a key chemical property that drives various processes in chemistry and biology. Its computational prediction is challenging, as evidenced by the fact that it has been a subject of considerable interest for several decades. Recent work has explored fingerprint-based, feature-based and graph-based representations with different machine learning and deep learning methodologies. In general, many traditional methods have been proposed, but they rely heavily on the quality of the rule-based, hand-crafted features. On the other hand, limitations in the quality of aqueous solubility data become a handicap when training deep models. In this study, we have developed a novel structure-aware method for the prediction of aqueous solubility by introducing a new deep network architecture and then employing a transfer learning approach. The model was proven to be competitive, obtaining an RMSE of 0.587 during both cross-validation and a test on an independent dataset. To be more precise, the method is evaluated on molecules downloaded from the Online Chemical Database and Modeling Environment (OCHEM). Beyond aqueous solubility prediction, the strategy presented in this work may be useful for modeling any kind of (chemical or biological) properties for which there is a limited amount of data available for model training.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667318521000210/pdfft?md5=6e2846286bacbae3a9814188cafabd4f&pid=1-s2.0-S2667318521000210-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47243540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Thomas Linden , Frank Hanses , Daniel Domingo-Fernández , Lauren Nicole DeLong , Alpha Tom Kodamullil , Jochen Schneider , Maria J.G.T. Vehreschild , Julia Lanznaster , Maria Madeleine Ruethrich , Stefan Borgmann , Martin Hower , Kai Wille , Torsten Feldt , Siegbert Rieg , Bernd Hertenstein , Christoph Wyen , Christoph Roemmele , Jörg Janne Vehreschild , Carolin E.M. Jakob , Melanie Stecher , Holger Fröhlich
{"title":"Machine Learning Based Prediction of COVID-19 Mortality Suggests Repositioning of Anticancer Drug for Treating Severe Cases","authors":"Thomas Linden , Frank Hanses , Daniel Domingo-Fernández , Lauren Nicole DeLong , Alpha Tom Kodamullil , Jochen Schneider , Maria J.G.T. Vehreschild , Julia Lanznaster , Maria Madeleine Ruethrich , Stefan Borgmann , Martin Hower , Kai Wille , Torsten Feldt , Siegbert Rieg , Bernd Hertenstein , Christoph Wyen , Christoph Roemmele , Jörg Janne Vehreschild , Carolin E.M. Jakob , Melanie Stecher , Holger Fröhlich","doi":"10.1016/j.ailsci.2021.100020","DOIUrl":"10.1016/j.ailsci.2021.100020","url":null,"abstract":"<div><p>Despite available vaccinations COVID-19 case numbers around the world are still growing, and effective medications against severe cases are lacking. In this work, we developed a machine learning model which predicts mortality for COVID-19 patients using data from the multi-center ‘Lean European Open Survey on SARS-CoV-2-infected patients’ (LEOSS) observational study (>100 active sites in Europe, primarily in Germany), resulting into an AUC of almost 80%. We showed that molecular mechanisms related to dementia, one of the relevant predictors in our model, intersect with those associated to COVID-19. Most notably, among these molecules was tyrosine kinase 2 (TYK2), a protein that has been patented as drug target in Alzheimer's Disease but also genetically associated with severe COVID-19 outcomes. We experimentally verified that anti-cancer drugs Sorafenib and Regorafenib showed a clear anti-cytopathic effect in Caco2 and VERO-E6 cells and can thus be regarded as potential treatments against COVID-19. Altogether, our work demonstrates that interpretation of machine learning based risk models can point towards drug targets and new treatment options, which are strongly needed for COVID-19.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8677630/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39649778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"OmicInt package: Exploring omics data and regulatory networks using integrative analyses and machine learning","authors":"Auste Kanapeckaite","doi":"10.1016/j.ailsci.2021.100025","DOIUrl":"10.1016/j.ailsci.2021.100025","url":null,"abstract":"<div><p><em>OmicInt</em> is an R software package developed for a user-friendly and in-depth exploration of significantly changed genes, gene expression patterns, and the associated epigenetic features as well as the related miRNA environment. In addition, <em>OmicInt</em> offers single cell RNA-seq and proteomics data integration to elucidate specific expression profiles. To achieve this, <em>OmicInt</em> builds on a novel scoring function capturing expression and pathology associations. The developed scoring function together with the implemented Gaussian mixture modelling pipline helps to explore genes and the linked interactome networks. The machine learning pipeline was designed to make the analyses straightforward for the non-experts so that researchers could take advantage of advanced analytics for their data evaluation. Additional functionalities, such as protein type and cellular location classification, provide useful assessments of the key interactors. The introduced package can aid in studying specific gene networks, understanding cellular perturbation events, and exploring interactions that might not be easily detectable otherwise. Thus, this robust set of bioinformatics tools can be very beneficial in drug discovery and target evaluation. <em>OmicInt</em> is designed to be freely accessible to involve a larger bioinformatics community and continuously improve the developed algorithmic methods.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667318521000258/pdfft?md5=8a49e27739636c1b6dadd1e75978907a&pid=1-s2.0-S2667318521000258-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45683888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Talia B. Kimber , Maxime Gagnebin , Andrea Volkamer
{"title":"Maxsmi: Maximizing molecular property prediction performance with confidence estimation using SMILES augmentation and deep learning","authors":"Talia B. Kimber , Maxime Gagnebin , Andrea Volkamer","doi":"10.1016/j.ailsci.2021.100014","DOIUrl":"10.1016/j.ailsci.2021.100014","url":null,"abstract":"<div><p>Accurate molecular property or activity prediction is one of the main goals in computer-aided drug design. Quantitative structure-activity relationship (QSAR) modeling and machine learning, more recently deep learning, have become an integral part of this process. Such algorithms require lots of data for training which, in the case of physico-chemical and bioactivity data sets, remains scarce. To address the lack of data, augmentation techniques are increasingly applied in deep learning. Here, we exploit that one compound can be represented by various SMILES strings as means of data augmentation and we explore several augmentation techniques. Convolutional and recurrent neural networks are trained on four data sets, including experimental solubility, lipophilicity, and bioactivity measurements. Moreover, the uncertainty of the models is assessed by applying augmentation on the test set. Our results show that data augmentation improves the accuracy independently of the deep learning model and of the size of the data. The best strategies lead to the Maxsmi models, the models that <strong>max</strong>imize the performance in <strong>SMI</strong>LES augmentation. Our findings show that the standard deviation of the per SMILES prediction correlates with the accuracy of the associated compound prediction. In addition, our systematic testing of different augmentation strategies provides an extensive guideline to SMILES augmentation. A prediction tool using the Maxsmi models for novel compounds on the aforementioned physico-chemical and bioactivity tasks is made available at <span>https://github.com/volkamerlab/maxsmi</span><svg><path></path></svg>.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667318521000143/pdfft?md5=2b8d2b601acd14d7fc4fb788c10b0c44&pid=1-s2.0-S2667318521000143-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45011603","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}