Towards quantifying the uncertainty in in silico predictions using Bayesian learning

Pub Date : 2022-08-01 DOI:10.1016/j.comtox.2022.100228

Timothy E.H. Allen , Alistair M. Middleton , Jonathan M. Goodman , Paul J. Russell , Predrag Kukic , Steve Gutsell

{"title":"Towards quantifying the uncertainty in in silico predictions using Bayesian learning","authors":"Timothy E.H. Allen , Alistair M. Middleton , Jonathan M. Goodman , Paul J. Russell , Predrag Kukic , Steve Gutsell","doi":"10.1016/j.comtox.2022.100228","DOIUrl":null,"url":null,"abstract":"<div><p>Next-generation risk assessment (NGRA) involves the combination of <em>in vitro</em> and <em>in silico</em> models for more human-relevant, ethical, and sustainable human chemical safety assessment. NGRA requires a quantitative mechanistic understanding of the effects of chemicals across human biology (be they molecular, cellular, organ-level or higher) coupled with a quantitative understanding of the uncertainty in any experimentally measured or predicted values. These values with their uncertainties can then be considered as a probability distribution, which can then be compared to exposure estimates to establish the presence or absence of a margin of safety. We have constructed Bayesian learning neural networks to provide such quantitative predictions and uncertainties for 20 pharmacologically important human molecular initiating events. These models produce high quality quantitative estimates (p(IC50), p(EC50), p(Ki), p(Kd)) of biochemical activity at a molecular initiating event (MIE) with average mean absolute errors (in Log units) of 0.625 ± 0.048 in test data and 0.941 ± 0.215 in external validation data. The key advantage of these models is their ability to also produce standard deviations and credible intervals (CIs) to quantify the uncertainty in these predictions, which we show to be able to distinguish between molecules close to the training data in chemical structure, those less similar to the training data, and decoy compounds drawn from the wider ChEMBL database. These uncertainty values mean that when a prediction is made a user can understand the certainty of the prediction, similar to a quantitative applicability domain, aiding prediction usefulness in NGRA. The ability for <em>in silico</em> methods to produce quantitative predictions with these kinds of probability distributions will be vital to their further use in NGRA, and here clear first steps have been taken.</p></div>","PeriodicalId":72666,"journal":{"name":"","volume":"23 ","pages":"Article 100228"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2468111322000160","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Next-generation risk assessment (NGRA) involves the combination of in vitro and in silico models for more human-relevant, ethical, and sustainable human chemical safety assessment. NGRA requires a quantitative mechanistic understanding of the effects of chemicals across human biology (be they molecular, cellular, organ-level or higher) coupled with a quantitative understanding of the uncertainty in any experimentally measured or predicted values. These values with their uncertainties can then be considered as a probability distribution, which can then be compared to exposure estimates to establish the presence or absence of a margin of safety. We have constructed Bayesian learning neural networks to provide such quantitative predictions and uncertainties for 20 pharmacologically important human molecular initiating events. These models produce high quality quantitative estimates (p(IC50), p(EC50), p(Ki), p(Kd)) of biochemical activity at a molecular initiating event (MIE) with average mean absolute errors (in Log units) of 0.625 ± 0.048 in test data and 0.941 ± 0.215 in external validation data. The key advantage of these models is their ability to also produce standard deviations and credible intervals (CIs) to quantify the uncertainty in these predictions, which we show to be able to distinguish between molecules close to the training data in chemical structure, those less similar to the training data, and decoy compounds drawn from the wider ChEMBL database. These uncertainty values mean that when a prediction is made a user can understand the certainty of the prediction, similar to a quantitative applicability domain, aiding prediction usefulness in NGRA. The ability for in silico methods to produce quantitative predictions with these kinds of probability distributions will be vital to their further use in NGRA, and here clear first steps have been taken.

Abstract Image

查看原文本刊更多论文

用贝叶斯学习量化计算机预测中的不确定性

下一代风险评估(NGRA)涉及体外和计算机模型的结合，以进行更多与人类相关的、伦理的和可持续的人类化学品安全评估。NGRA要求对化学物质在整个人类生物学中的作用(无论是分子、细胞、器官水平还是更高水平)有定量的机制理解，同时对任何实验测量或预测值的不确定性有定量的理解。然后，这些具有不确定性的值可以被视为概率分布，然后可以将其与暴露估计进行比较，以确定是否存在安全边际。我们构建了贝叶斯学习神经网络，为20个药理学上重要的人类分子起始事件提供定量预测和不确定性。这些模型产生了高质量的定量估计(p(IC50)、p(EC50)、p(Ki)、p(Kd))，测试数据的平均绝对误差(Log单位)为0.625±0.048，外部验证数据的平均绝对误差为0.941±0.215。这些模型的关键优势在于它们还能够产生标准偏差和可信区间(ci)来量化这些预测中的不确定性，我们表明能够区分化学结构接近训练数据的分子，与训练数据不太相似的分子，以及从更广泛的ChEMBL数据库中提取的诱饵化合物。这些不确定性值意味着，当进行预测时，用户可以理解预测的确定性，类似于定量适用性领域，有助于NGRA中的预测有用性。用这些概率分布产生定量预测的计算机方法的能力对于它们在NGRA中的进一步应用至关重要，在这里已经迈出了明确的第一步。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文