James Thompson , W Patrick Walters , Jianwen A Feng , Nicolas A Pabon , Hongcheng Xu , Michael Maser , Brian B Goldman , Demetri Moustakas , Molly Schmidt , Forrest York
{"title":"Optimizing active learning for free energy calculations","authors":"James Thompson , W Patrick Walters , Jianwen A Feng , Nicolas A Pabon , Hongcheng Xu , Michael Maser , Brian B Goldman , Demetri Moustakas , Molly Schmidt , Forrest York","doi":"10.1016/j.ailsci.2022.100050","DOIUrl":"10.1016/j.ailsci.2022.100050","url":null,"abstract":"<div><p>While Relative Binding Free Energy (RBFE) calculations have become a mainstay in lead optimization programs, the computational expense of performing these calculations has limited their broader application. Active learning (AL), a machine learning method used to direct a search iteratively, has explored larger chemical libraries using RBFE calculations. While AL has been successfully applied, there has not been a systematic study of the impact of parameter settings on the performance of AL. To address this gap, we have generated an exhaustive dataset of RBFE calculations on 10,000 congeneric molecules. We used this dataset to explore the impact of several AL design choices, including the number of molecules sampled at each iteration, the method used to select an initial sample, the method used to build a machine learning model, and the acquisition function that defines the balance between exploration and exploitation in the search. Our studies demonstrated that the performance of AL is largely insensitive to the specific machine learning method and acquisition functions used. In our studies, the most significant factor impacting performance was the number of molecules sampled at each iteration where selecting too few molecules hurts performance. Under the best conditions, we were able to identify 75% of the 100 top scoring molecules by sampling only 6% of the dataset. We hope that the dataset of 10K molecules will provide the basis for future studies exploring additional AL strategies. The source code and supporting data for the work are available at <span>https://github.com/google-research/google-research/tree/master/al_for_fep</span><svg><path></path></svg>.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"2 ","pages":"Article 100050"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667318522000204/pdfft?md5=fd95fcb1f3da91cd7543db829403ca90&pid=1-s2.0-S2667318522000204-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48384591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Linlin Zhao , Floriane Montanari , Henry Heberle , Sebastian Schmidt
{"title":"Modeling bioconcentration factors in fish with explainable deep learning","authors":"Linlin Zhao , Floriane Montanari , Henry Heberle , Sebastian Schmidt","doi":"10.1016/j.ailsci.2022.100047","DOIUrl":"10.1016/j.ailsci.2022.100047","url":null,"abstract":"<div><p>The Bioconcentration Factor (BCF) is an important parameter in the environmental risk assessment of chemicals, relevant for industrial and academic research as well as required in many regulatory contexts. It represents the potential of a substance to accumulate in organic tissues or whole animals and is most frequently measured in fish. However, animal welfare reasons, throughput limitations, and costs push the need for alternative methods that allow accurate and reliable estimations of BCF in silico. We present a new deep learning model to predict BCF values from chemical structures, that outperforms currently available models (<span><math><msup><mi>R</mi><mn>2</mn></msup></math></span> of 0.68 and RMSE of 0.59 log units on an external test set; <span><math><msup><mi>R</mi><mn>2</mn></msup></math></span> of 0.70 and RMSE of 0.74 log units in a demanding cluster split validation). The model is based on molecular representations encoded as CDDD descriptors and exploits a large in-house dataset with measured logD values as an auxiliary task.</p><p>Additionally, we developed a post-hoc explainability method based on SMILES character substitutions to accompany our predictions with atom-level interpretations. These sensitivity scores highlight the most influential moieties in the molecule and can help to understand the predictions better and design new molecules.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"2 ","pages":"Article 100047"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667318522000174/pdfft?md5=d1e08bc12ac334ce4c4ea0eb17936560&pid=1-s2.0-S2667318522000174-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45371673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Symbolic regression for the interpretation of quantitative structure-property relationships","authors":"Katsushi Takaki , Tomoyuki Miyao","doi":"10.1016/j.ailsci.2022.100046","DOIUrl":"10.1016/j.ailsci.2022.100046","url":null,"abstract":"<div><p>The interpretation of quantitative structure–activity or structure–property relationships is important in the field of chemoinformatics. Although multivariate linear regression models are typically interpretable, they do not generally have high predictive abilities. Symbolic regression (SR) combined with genetic programming (GP) is a well-established technique for generating the mathematical expressions that describe the relationships within a dataset. However, SR sometimes produces complicated expressions that are hard for humans to interpret. This paper proposes a method for generating simpler expressions by incorporating three filters into GP-based SR. The filters are further combined with nonlinear least-squares optimization to give filter-introduced GP (FIGP), which improves the predictive ability of SR models while retaining simple expressions. As a proof-of-concept, the quantitative estimate of drug-likeness and the synthetic accessibility score are predicted based on the chemical structures of compounds. Overall, FIGP generates less-complicated expressions than previous SR methods. In terms of predictive ability, FIGP is better than GP, but is outperformed by a support vector machine with a radial basis function kernel. Furthermore, quantitative structure–activity relationship models are constructed for three matching molecular series with biological targets. In the case of one target, the activity prediction models given by FIGP exhibit better predictive ability than multivariate linear regression and support vector regression with the radial basis function kernel, whereas for the remaining cases, FIGP is slightly less accurate than multivariate linear regression.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"2 ","pages":"Article 100046"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667318522000162/pdfft?md5=d40d5f4fb6a5861ba6faf6c4bcb2c52c&pid=1-s2.0-S2667318522000162-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42959550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deepitope: Prediction of HLA-independent T-cell epitopes mediated by MHC class II using a convolutional neural network","authors":"Raphael Trevizani , Fábio Lima Custódio","doi":"10.1016/j.ailsci.2022.100038","DOIUrl":"10.1016/j.ailsci.2022.100038","url":null,"abstract":"<div><p>Computational linear T-cell epitope prediction tools allow cost and labor reduction in downstream <em>in vitro</em> testing, but the quality of currently available methods is compromised by the scarcity of experimental data and extensive HLA polymorphism. However, it is possible to improve prediction quality by forgoing HLA-dependency that allows treating all immunogenic sequences as a single group. This reduces the problem to a much simpler two-classes classification of determining whether a peptide is immunogenic or not. Here, we use a deep convolutional neural network capable of predicting linear T-cell epitope regions in primary structures trained using all peptides deposited in the IEDB website. We also investigate the possibility of using peptides derived from known human proteins as non-immunogenic counterexamples. We compared our model with a state-of-the-art tool and analyze the benefits of using larger databases. Our results corroborate the usefulness of HLA-free methods for practical applications that require the identification of immunogenic sequences. Deepitope is an open source project that can be found at <span>https://github.com/raphaeltrevizani/deepitope</span><svg><path></path></svg>.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"2 ","pages":"Article 100038"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667318522000095/pdfft?md5=14ba0e71b89c009c171d8f8bde7e5f43&pid=1-s2.0-S2667318522000095-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43701924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Thomas Linden , Frank Hanses , Daniel Domingo-Fernández , Lauren Nicole DeLong , Alpha Tom Kodamullil , Jochen Schneider , Maria J.G.T. Vehreschild , Julia Lanznaster , Maria Madeleine Ruethrich , Stefan Borgmann , Martin Hower , Kai Wille , Torsten Feldt , Siegbert Rieg , Bernd Hertenstein , Christoph Wyen , Christoph Roemmele , Jörg Janne Vehreschild , Carolin E.M. Jakob , Melanie Stecher , Holger Fröhlich
{"title":"Corrigendum to “Machine Learning Based Prediction of COVID-19 Mortality Suggests Repositioning of Anticancer Drug for Treating Severe Cases”[Artificial Intelligence in Life Sciences] 1(2021), 100020","authors":"Thomas Linden , Frank Hanses , Daniel Domingo-Fernández , Lauren Nicole DeLong , Alpha Tom Kodamullil , Jochen Schneider , Maria J.G.T. Vehreschild , Julia Lanznaster , Maria Madeleine Ruethrich , Stefan Borgmann , Martin Hower , Kai Wille , Torsten Feldt , Siegbert Rieg , Bernd Hertenstein , Christoph Wyen , Christoph Roemmele , Jörg Janne Vehreschild , Carolin E.M. Jakob , Melanie Stecher , Holger Fröhlich","doi":"10.1016/j.ailsci.2022.100032","DOIUrl":"10.1016/j.ailsci.2022.100032","url":null,"abstract":"","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"2 ","pages":"Article 100032"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8824443/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39916555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"AI for drug design: From explicit rules to deep learning","authors":"Lewis Mervin , Samuel Genheden , Ola Engkvist","doi":"10.1016/j.ailsci.2022.100041","DOIUrl":"10.1016/j.ailsci.2022.100041","url":null,"abstract":"","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"2 ","pages":"Article 100041"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667318522000113/pdfft?md5=657b847f321004a995d4c509e863e3a9&pid=1-s2.0-S2667318522000113-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46445253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Andrés Martínez Mora , Mickael Mogemark , Vigneshwari Subramanian , Filip Miljković
{"title":"Interpretation of multi-task clearance models from molecular images supported by experimental design","authors":"Andrés Martínez Mora , Mickael Mogemark , Vigneshwari Subramanian , Filip Miljković","doi":"10.1016/j.ailsci.2022.100048","DOIUrl":"10.1016/j.ailsci.2022.100048","url":null,"abstract":"<div><p>Recent methodological advances in deep learning (DL) architectures have not only improved the performance of predictive models but also enhanced their interpretability potential, thus considerably increasing their transparency. In the context of medicinal chemistry, the potential to not only accurately predict molecular properties, but also chemically interpret them, would be strongly preferred. Previously, we developed accurate multi-task convolutional neural network (CNN) and graph convolutional neural network (GCNN) models to predict a set of diverse intrinsic metabolic clearance parameters from image- and graph-based molecular representations, respectively. Herein, we introduce several model interpretability frameworks to answer whether the model explanations obtained from CNN and GCNN multi-task clearance models could be applied to predict chemical transformations associated with experimentally confirmed metabolic products. We show a strong correlation between the CNN pixel intensities and corresponding clearance predictions, as well as their robustness to different molecular orientations. Using actual case examples, we demonstrate that both CNN and GCNN interpretations frequently complement each other, suggesting their high potential for combined use in guiding medicinal chemistry design.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"2 ","pages":"Article 100048"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667318522000186/pdfft?md5=fc7537dd4777fa93dd0a74d1d81c0c55&pid=1-s2.0-S2667318522000186-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41622538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Understanding uncertainty in deep learning builds confidence","authors":"Jürgen Bajorath","doi":"10.1016/j.ailsci.2022.100033","DOIUrl":"10.1016/j.ailsci.2022.100033","url":null,"abstract":"","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"2 ","pages":"Article 100033"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667318522000046/pdfft?md5=b881f0e2a53af340f6a1b73b950b6d6f&pid=1-s2.0-S2667318522000046-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44028547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deep learning of protein–ligand interactions—Remembering the actors","authors":"Jürgen Bajorath","doi":"10.1016/j.ailsci.2022.100037","DOIUrl":"10.1016/j.ailsci.2022.100037","url":null,"abstract":"","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"2 ","pages":"Article 100037"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667318522000083/pdfft?md5=63e8dc2f154d93e6ede44a89727be89e&pid=1-s2.0-S2667318522000083-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44934025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Stephen Bonner , Ian P. Barrett , Cheng Ye , Rowan Swiers , Ola Engkvist , Charles Tapley Hoyt , William L. Hamilton
{"title":"Understanding the performance of knowledge graph embeddings in drug discovery","authors":"Stephen Bonner , Ian P. Barrett , Cheng Ye , Rowan Swiers , Ola Engkvist , Charles Tapley Hoyt , William L. Hamilton","doi":"10.1016/j.ailsci.2022.100036","DOIUrl":"https://doi.org/10.1016/j.ailsci.2022.100036","url":null,"abstract":"<div><p>Knowledge Graphs (KG) and associated Knowledge Graph Embedding (KGE) models have recently begun to be explored in the context of drug discovery and have the potential to assist in key challenges such as target identification. In the drug discovery domain, KGs can be employed as part of a process which can result in lab-based experiments being performed, or impact on other decisions, incurring significant time and financial costs and most importantly, ultimately influencing patient healthcare. For KGE models to have impact in this domain, a better understanding of not only of performance, but also the various factors which determine it, is required.</p><p>In this study we investigate, over the course of many thousands of experiments, the predictive performance of five KGE models on two public drug discovery-oriented KGs. Our goal is not to focus on the best overall model or configuration, instead we take a deeper look at how performance can be affected by changes in the training setup, choice of hyperparameters, model parameter initialisation seed and different splits of the datasets. Our results highlight that these factors have significant impact on performance and can even affect the ranking of models. Indeed these factors should be reported along with model architectures to ensure complete reproducibility and fair comparisons of future work, and we argue this is critical for the acceptance of use, and impact of KGEs in a biomedical setting.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"2 ","pages":"Article 100036"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667318522000071/pdfft?md5=06ed4e6a1e3c501ecb6c465108f88691&pid=1-s2.0-S2667318522000071-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91728647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}