预测分析工具包(PAT)能处理基因组数据集吗?

IF 2.9 Q2 TOXICOLOGY

Computational Toxicology Pub Date : 2022-11-01 DOI:10.1016/j.comtox.2022.100241

Ted W. Simon , Louis A. (Tony) Cox , Richard A. Becker

{"title":"预测分析工具包(PAT)能处理基因组数据集吗?","authors":"Ted W. Simon , Louis A. (Tony) Cox , Richard A. Becker","doi":"10.1016/j.comtox.2022.100241","DOIUrl":null,"url":null,"abstract":"<div><p>The Predictive Analytics Toolkit (PAT) was developed to facilitate use of new approach methodologies (NAMs) to predict health hazards and risks from chemicals. PAT is a user-friendly web application that integrates many R packages to enable development and testing of prediction models without any programming. We drew from the work of Ring et al. 2021 (<span>https://doi.org/10.1016/j.comtox.2021.100166)</span><svg><path></path></svg>, who used random forest models to predict <em>in vivo</em> transcriptomic responses in rat liver from <em>in vitro</em> Tox21 AC50 values for a set of 221 chemicals. Gene ontologies helped identify 735 biological pathways based on differential <em>in vivo</em> expression of specific gene sets. Ring et al. used 12 models that varied in use of toxicokinetics to predict <em>in vivo</em> activity using 5000 random forest iterations for each chemical/pathway combination (the area under the receiver-operator characteristic curve (AUC-ROC) was the measure of model performance). The highest-ranking model (Model 10) used Tox21 AC50 nominal concentrations converted to media concentrations and <em>in vivo</em> doses converted to circulating plasma concentrations; the lowest ranking model (Model 2) used nominal <em>in vitro</em> concentrations and administered <em>in vivo</em> dose levels. Using a subset of 10 pathways from the Ring et al. data, we used PAT to predict the AUC-ROC and to compare the best (Model 10) and worst (Model 2) performing models with only 100 random forest iterations. Using the results from PAT, Model 10 “won” in 60% of the comparisons, a value similar to that calculated for the identical set of comparisons using the supplemental data from Ring et al. (52.2%). Hence, PAT can provide a useful alternative to programming in R for prediction modeling and model performance evaluation, even for extensive genomic data sets.</p></div>","PeriodicalId":37651,"journal":{"name":"Computational Toxicology","volume":"24 ","pages":"Article 100241"},"PeriodicalIF":2.9000,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2468111322000299/pdfft?md5=57556db7f1c9f97e6dd8e33e956d67d5&pid=1-s2.0-S2468111322000299-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Can the Predictive Analytics Toolkit (PAT) handle a genomic data set?\",\"authors\":\"Ted W. Simon , Louis A. (Tony) Cox , Richard A. Becker\",\"doi\":\"10.1016/j.comtox.2022.100241\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>The Predictive Analytics Toolkit (PAT) was developed to facilitate use of new approach methodologies (NAMs) to predict health hazards and risks from chemicals. PAT is a user-friendly web application that integrates many R packages to enable development and testing of prediction models without any programming. We drew from the work of Ring et al. 2021 (<span>https://doi.org/10.1016/j.comtox.2021.100166)</span><svg><path></path></svg>, who used random forest models to predict <em>in vivo</em> transcriptomic responses in rat liver from <em>in vitro</em> Tox21 AC50 values for a set of 221 chemicals. Gene ontologies helped identify 735 biological pathways based on differential <em>in vivo</em> expression of specific gene sets. Ring et al. used 12 models that varied in use of toxicokinetics to predict <em>in vivo</em> activity using 5000 random forest iterations for each chemical/pathway combination (the area under the receiver-operator characteristic curve (AUC-ROC) was the measure of model performance). The highest-ranking model (Model 10) used Tox21 AC50 nominal concentrations converted to media concentrations and <em>in vivo</em> doses converted to circulating plasma concentrations; the lowest ranking model (Model 2) used nominal <em>in vitro</em> concentrations and administered <em>in vivo</em> dose levels. Using a subset of 10 pathways from the Ring et al. data, we used PAT to predict the AUC-ROC and to compare the best (Model 10) and worst (Model 2) performing models with only 100 random forest iterations. Using the results from PAT, Model 10 “won” in 60% of the comparisons, a value similar to that calculated for the identical set of comparisons using the supplemental data from Ring et al. (52.2%). Hence, PAT can provide a useful alternative to programming in R for prediction modeling and model performance evaluation, even for extensive genomic data sets.</p></div>\",\"PeriodicalId\":37651,\"journal\":{\"name\":\"Computational Toxicology\",\"volume\":\"24 \",\"pages\":\"Article 100241\"},\"PeriodicalIF\":2.9000,\"publicationDate\":\"2022-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S2468111322000299/pdfft?md5=57556db7f1c9f97e6dd8e33e956d67d5&pid=1-s2.0-S2468111322000299-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computational Toxicology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2468111322000299\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"TOXICOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Toxicology","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2468111322000299","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"TOXICOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

开发预测分析工具包(PAT)是为了促进使用新的方法方法(NAMs)来预测化学品的健康危害和风险。PAT是一个用户友好的web应用程序，它集成了许多R包，可以在没有任何编程的情况下开发和测试预测模型。我们借鉴了Ring等人2021 (https://doi.org/10.1016/j.comtox.2021.100166)的工作，他们使用随机森林模型预测了221种化学物质的体外Tox21 AC50值在大鼠肝脏中的体内转录组反应。基因本体论帮助确定了735种基于体内特定基因组差异表达的生物学途径。Ring等人使用了12种不同的毒性动力学模型，对每种化学物质/途径组合使用5000次随机森林迭代来预测体内活性(接受者-操作者特征曲线(AUC-ROC)下的面积是模型性能的衡量标准)。最高级模型(模型10)将Tox21 AC50名义浓度转换为培养基浓度，体内剂量转换为循环血浆浓度;排名最低的模型(模型2)使用名义体外浓度和体内给药剂量水平。使用来自Ring等人数据的10条路径的子集，我们使用PAT来预测AUC-ROC，并比较只有100次随机森林迭代的最佳(模型10)和最差(模型2)模型。使用PAT的结果，模型10在60%的比较中“胜出”，这一值与使用Ring等人的补充数据对同一组比较计算出的值相似(52.2%)。因此，PAT可以为预测建模和模型性能评估提供一种有用的替代方法，甚至可以用于广泛的基因组数据集。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Can the Predictive Analytics Toolkit (PAT) handle a genomic data set?

The Predictive Analytics Toolkit (PAT) was developed to facilitate use of new approach methodologies (NAMs) to predict health hazards and risks from chemicals. PAT is a user-friendly web application that integrates many R packages to enable development and testing of prediction models without any programming. We drew from the work of Ring et al. 2021 (https://doi.org/10.1016/j.comtox.2021.100166), who used random forest models to predict in vivo transcriptomic responses in rat liver from in vitro Tox21 AC50 values for a set of 221 chemicals. Gene ontologies helped identify 735 biological pathways based on differential in vivo expression of specific gene sets. Ring et al. used 12 models that varied in use of toxicokinetics to predict in vivo activity using 5000 random forest iterations for each chemical/pathway combination (the area under the receiver-operator characteristic curve (AUC-ROC) was the measure of model performance). The highest-ranking model (Model 10) used Tox21 AC50 nominal concentrations converted to media concentrations and in vivo doses converted to circulating plasma concentrations; the lowest ranking model (Model 2) used nominal in vitro concentrations and administered in vivo dose levels. Using a subset of 10 pathways from the Ring et al. data, we used PAT to predict the AUC-ROC and to compare the best (Model 10) and worst (Model 2) performing models with only 100 random forest iterations. Using the results from PAT, Model 10 “won” in 60% of the comparisons, a value similar to that calculated for the identical set of comparisons using the supplemental data from Ring et al. (52.2%). Hence, PAT can provide a useful alternative to programming in R for prediction modeling and model performance evaluation, even for extensive genomic data sets.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computational Toxicology Computer Science-Computer Science Applications

CiteScore

5.50

自引率

0.00%

发文量

审稿时长

56 days

期刊介绍： Computational Toxicology is an international journal publishing computational approaches that assist in the toxicological evaluation of new and existing chemical substances assisting in their safety assessment. -All effects relating to human health and environmental toxicity and fate -Prediction of toxicity, metabolism, fate and physico-chemical properties -The development of models from read-across, (Q)SARs, PBPK, QIVIVE, Multi-Scale Models -Big Data in toxicology: integration, management, analysis -Implementation of models through AOPs, IATA, TTC -Regulatory acceptance of models: evaluation, verification and validation -From metals, to small organic molecules to nanoparticles -Pharmaceuticals, pesticides, foods, cosmetics, fine chemicals -Bringing together the views of industry, regulators, academia, NGOs