Ted W. Simon , Louis A. (Tony) Cox , Richard A. Becker
{"title":"Can the Predictive Analytics Toolkit (PAT) handle a genomic data set?","authors":"Ted W. Simon , Louis A. (Tony) Cox , Richard A. Becker","doi":"10.1016/j.comtox.2022.100241","DOIUrl":null,"url":null,"abstract":"<div><p>The Predictive Analytics Toolkit (PAT) was developed to facilitate use of new approach methodologies (NAMs) to predict health hazards and risks from chemicals. PAT is a user-friendly web application that integrates many R packages to enable development and testing of prediction models without any programming. We drew from the work of Ring et al. 2021 (<span>https://doi.org/10.1016/j.comtox.2021.100166)</span><svg><path></path></svg>, who used random forest models to predict <em>in vivo</em> transcriptomic responses in rat liver from <em>in vitro</em> Tox21 AC50 values for a set of 221 chemicals. Gene ontologies helped identify 735 biological pathways based on differential <em>in vivo</em> expression of specific gene sets. Ring et al. used 12 models that varied in use of toxicokinetics to predict <em>in vivo</em> activity using 5000 random forest iterations for each chemical/pathway combination (the area under the receiver-operator characteristic curve (AUC-ROC) was the measure of model performance). The highest-ranking model (Model 10) used Tox21 AC50 nominal concentrations converted to media concentrations and <em>in vivo</em> doses converted to circulating plasma concentrations; the lowest ranking model (Model 2) used nominal <em>in vitro</em> concentrations and administered <em>in vivo</em> dose levels. Using a subset of 10 pathways from the Ring et al. data, we used PAT to predict the AUC-ROC and to compare the best (Model 10) and worst (Model 2) performing models with only 100 random forest iterations. Using the results from PAT, Model 10 “won” in 60% of the comparisons, a value similar to that calculated for the identical set of comparisons using the supplemental data from Ring et al. (52.2%). Hence, PAT can provide a useful alternative to programming in R for prediction modeling and model performance evaluation, even for extensive genomic data sets.</p></div>","PeriodicalId":37651,"journal":{"name":"Computational Toxicology","volume":null,"pages":null},"PeriodicalIF":3.1000,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2468111322000299/pdfft?md5=57556db7f1c9f97e6dd8e33e956d67d5&pid=1-s2.0-S2468111322000299-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Toxicology","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2468111322000299","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"TOXICOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
The Predictive Analytics Toolkit (PAT) was developed to facilitate use of new approach methodologies (NAMs) to predict health hazards and risks from chemicals. PAT is a user-friendly web application that integrates many R packages to enable development and testing of prediction models without any programming. We drew from the work of Ring et al. 2021 (https://doi.org/10.1016/j.comtox.2021.100166), who used random forest models to predict in vivo transcriptomic responses in rat liver from in vitro Tox21 AC50 values for a set of 221 chemicals. Gene ontologies helped identify 735 biological pathways based on differential in vivo expression of specific gene sets. Ring et al. used 12 models that varied in use of toxicokinetics to predict in vivo activity using 5000 random forest iterations for each chemical/pathway combination (the area under the receiver-operator characteristic curve (AUC-ROC) was the measure of model performance). The highest-ranking model (Model 10) used Tox21 AC50 nominal concentrations converted to media concentrations and in vivo doses converted to circulating plasma concentrations; the lowest ranking model (Model 2) used nominal in vitro concentrations and administered in vivo dose levels. Using a subset of 10 pathways from the Ring et al. data, we used PAT to predict the AUC-ROC and to compare the best (Model 10) and worst (Model 2) performing models with only 100 random forest iterations. Using the results from PAT, Model 10 “won” in 60% of the comparisons, a value similar to that calculated for the identical set of comparisons using the supplemental data from Ring et al. (52.2%). Hence, PAT can provide a useful alternative to programming in R for prediction modeling and model performance evaluation, even for extensive genomic data sets.
期刊介绍:
Computational Toxicology is an international journal publishing computational approaches that assist in the toxicological evaluation of new and existing chemical substances assisting in their safety assessment. -All effects relating to human health and environmental toxicity and fate -Prediction of toxicity, metabolism, fate and physico-chemical properties -The development of models from read-across, (Q)SARs, PBPK, QIVIVE, Multi-Scale Models -Big Data in toxicology: integration, management, analysis -Implementation of models through AOPs, IATA, TTC -Regulatory acceptance of models: evaluation, verification and validation -From metals, to small organic molecules to nanoparticles -Pharmaceuticals, pesticides, foods, cosmetics, fine chemicals -Bringing together the views of industry, regulators, academia, NGOs