{"title":"Quantifying sources of uncertainty in drug discovery predictions with probabilistic models","authors":"Stanley E. Lazic , Dominic P. Williams","doi":"10.1016/j.ailsci.2021.100004","DOIUrl":null,"url":null,"abstract":"<div><p>Knowing the uncertainty in a prediction is critical when making expensive investment decisions and when patient safety is paramount, but machine learning (ML) models in drug discovery typically only provide a single best estimate and ignore all sources of uncertainty. Predictions from these models may therefore be over-confident, which can put patients at risk and waste resources when compounds that are destined to fail are further developed. Probabilistic predictive models (PPMs) can incorporate all sources of uncertainty and they return a distribution of predicted values that represents the uncertainty in the prediction. We describe seven sources of uncertainty in PPMs: data, distribution function, mean function, variance function, link function(s), parameters, and hyperparameters. We use toxicity prediction as a running example, but the same principles apply for all prediction models. The consequences of ignoring uncertainty and how PPMs account for uncertainty are also described. We aim to make the discussion accessible to a broad non-mathematical audience. Equations are provided to make ideas concrete for mathematical readers (but can be skipped without loss of understanding) and code is available for computational researchers (<span>https://github.com/stanlazic/ML_uncertainty_quantification</span><svg><path></path></svg>).</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/j.ailsci.2021.100004","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial intelligence in the life sciences","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2667318521000040","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8
Abstract
Knowing the uncertainty in a prediction is critical when making expensive investment decisions and when patient safety is paramount, but machine learning (ML) models in drug discovery typically only provide a single best estimate and ignore all sources of uncertainty. Predictions from these models may therefore be over-confident, which can put patients at risk and waste resources when compounds that are destined to fail are further developed. Probabilistic predictive models (PPMs) can incorporate all sources of uncertainty and they return a distribution of predicted values that represents the uncertainty in the prediction. We describe seven sources of uncertainty in PPMs: data, distribution function, mean function, variance function, link function(s), parameters, and hyperparameters. We use toxicity prediction as a running example, but the same principles apply for all prediction models. The consequences of ignoring uncertainty and how PPMs account for uncertainty are also described. We aim to make the discussion accessible to a broad non-mathematical audience. Equations are provided to make ideas concrete for mathematical readers (but can be skipped without loss of understanding) and code is available for computational researchers (https://github.com/stanlazic/ML_uncertainty_quantification).
Artificial intelligence in the life sciencesPharmacology, Biochemistry, Genetics and Molecular Biology (General), Computer Science Applications, Health Informatics, Drug Discovery, Veterinary Science and Veterinary Medicine (General)