通过对高维基因表达数据的额外可解释总结,改进了基于细胞毒性测定的化合物肝毒性分类

IF 3.1 Q2 TOXICOLOGY
Marieke Stolte , Wiebke Albrecht , Tim Brecklinghaus , Lisa Gründler , Peng Chen , Jan G. Hengstler , Franziska Kappenberg , Jörg Rahnenführer
{"title":"通过对高维基因表达数据的额外可解释总结,改进了基于细胞毒性测定的化合物肝毒性分类","authors":"Marieke Stolte ,&nbsp;Wiebke Albrecht ,&nbsp;Tim Brecklinghaus ,&nbsp;Lisa Gründler ,&nbsp;Peng Chen ,&nbsp;Jan G. Hengstler ,&nbsp;Franziska Kappenberg ,&nbsp;Jörg Rahnenführer","doi":"10.1016/j.comtox.2023.100288","DOIUrl":null,"url":null,"abstract":"<div><p>Established cytotoxicity assays are commonly used for assessing the hepatotoxic risk of a compound. The addition of gene expression measurements from high-dimensional RNAseq experiments offers the potential for improved classification. However, it is generally not clear how best to summarize the high-dimensional gene measurements into meaningful variables. We propose several intuitive methods for dimension reduction of gene expression measurements toward interpretable variables and explore their relevance in predicting hepatotoxicity, using a dataset with 60 compounds.</p><p>Different advanced statistical learning algorithms are evaluated as classification methods and their performances are compared on the dataset. The best predictions are achieved by tree-based methods such as random forest and xgboost, and tuning the parameters of the algorithm helps to improve the classification accuracy. It is shown that the simultaneous use of data from cytotoxicity assays and from gene expression variables summarized in different ways has a synergistic effect and leads to a better prediction of hepatotoxicity than both sets of variables individually. Further, when gene expression data are summarized, different strategies for the generation of interpretable variables contribute to the overall improved prediction quality. When considering cytotoxicity assays alone, the best classification method yields a mean accuracy of 0.757, while the same classification method and an optimal choice of variables yields a mean accuracy of 0.811. The overall best value for the mean accuracy is 0.821.</p></div>","PeriodicalId":37651,"journal":{"name":"Computational Toxicology","volume":"28 ","pages":"Article 100288"},"PeriodicalIF":3.1000,"publicationDate":"2023-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Classification of hepatotoxicity of compounds based on cytotoxicity assays is improved by additional interpretable summaries of high-dimensional gene expression data\",\"authors\":\"Marieke Stolte ,&nbsp;Wiebke Albrecht ,&nbsp;Tim Brecklinghaus ,&nbsp;Lisa Gründler ,&nbsp;Peng Chen ,&nbsp;Jan G. Hengstler ,&nbsp;Franziska Kappenberg ,&nbsp;Jörg Rahnenführer\",\"doi\":\"10.1016/j.comtox.2023.100288\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Established cytotoxicity assays are commonly used for assessing the hepatotoxic risk of a compound. The addition of gene expression measurements from high-dimensional RNAseq experiments offers the potential for improved classification. However, it is generally not clear how best to summarize the high-dimensional gene measurements into meaningful variables. We propose several intuitive methods for dimension reduction of gene expression measurements toward interpretable variables and explore their relevance in predicting hepatotoxicity, using a dataset with 60 compounds.</p><p>Different advanced statistical learning algorithms are evaluated as classification methods and their performances are compared on the dataset. The best predictions are achieved by tree-based methods such as random forest and xgboost, and tuning the parameters of the algorithm helps to improve the classification accuracy. It is shown that the simultaneous use of data from cytotoxicity assays and from gene expression variables summarized in different ways has a synergistic effect and leads to a better prediction of hepatotoxicity than both sets of variables individually. Further, when gene expression data are summarized, different strategies for the generation of interpretable variables contribute to the overall improved prediction quality. When considering cytotoxicity assays alone, the best classification method yields a mean accuracy of 0.757, while the same classification method and an optimal choice of variables yields a mean accuracy of 0.811. The overall best value for the mean accuracy is 0.821.</p></div>\",\"PeriodicalId\":37651,\"journal\":{\"name\":\"Computational Toxicology\",\"volume\":\"28 \",\"pages\":\"Article 100288\"},\"PeriodicalIF\":3.1000,\"publicationDate\":\"2023-09-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computational Toxicology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2468111323000294\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"TOXICOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Toxicology","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2468111323000294","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"TOXICOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

已建立的细胞毒性测定法通常用于评估化合物的肝毒性风险。添加来自高维RNAseq实验的基因表达测量提供了改进分类的潜力。然而,通常不清楚如何最好地将高维基因测量总结为有意义的变量。我们提出了几种针对可解释变量的基因表达测量降维的直观方法,并使用包含60种化合物的数据集探讨了它们在预测肝毒性中的相关性。评估了不同的高级统计学习算法作为分类方法,并在数据集上比较了它们的性能。最佳预测是通过基于树的方法(如随机森林和xgboost)实现的,调整算法的参数有助于提高分类精度。研究表明,同时使用细胞毒性测定和以不同方式总结的基因表达变量的数据具有协同效应,并比单独使用两组变量更好地预测肝毒性。此外,当总结基因表达数据时,产生可解释变量的不同策略有助于整体提高预测质量。当单独考虑细胞毒性测定时,最佳分类方法的平均准确度为0.757,而相同的分类方法和变量的最佳选择的平均准确率为0.811。平均精度的总体最佳值为0.821。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Classification of hepatotoxicity of compounds based on cytotoxicity assays is improved by additional interpretable summaries of high-dimensional gene expression data

Established cytotoxicity assays are commonly used for assessing the hepatotoxic risk of a compound. The addition of gene expression measurements from high-dimensional RNAseq experiments offers the potential for improved classification. However, it is generally not clear how best to summarize the high-dimensional gene measurements into meaningful variables. We propose several intuitive methods for dimension reduction of gene expression measurements toward interpretable variables and explore their relevance in predicting hepatotoxicity, using a dataset with 60 compounds.

Different advanced statistical learning algorithms are evaluated as classification methods and their performances are compared on the dataset. The best predictions are achieved by tree-based methods such as random forest and xgboost, and tuning the parameters of the algorithm helps to improve the classification accuracy. It is shown that the simultaneous use of data from cytotoxicity assays and from gene expression variables summarized in different ways has a synergistic effect and leads to a better prediction of hepatotoxicity than both sets of variables individually. Further, when gene expression data are summarized, different strategies for the generation of interpretable variables contribute to the overall improved prediction quality. When considering cytotoxicity assays alone, the best classification method yields a mean accuracy of 0.757, while the same classification method and an optimal choice of variables yields a mean accuracy of 0.811. The overall best value for the mean accuracy is 0.821.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Computational Toxicology
Computational Toxicology Computer Science-Computer Science Applications
CiteScore
5.50
自引率
0.00%
发文量
53
审稿时长
56 days
期刊介绍: Computational Toxicology is an international journal publishing computational approaches that assist in the toxicological evaluation of new and existing chemical substances assisting in their safety assessment. -All effects relating to human health and environmental toxicity and fate -Prediction of toxicity, metabolism, fate and physico-chemical properties -The development of models from read-across, (Q)SARs, PBPK, QIVIVE, Multi-Scale Models -Big Data in toxicology: integration, management, analysis -Implementation of models through AOPs, IATA, TTC -Regulatory acceptance of models: evaluation, verification and validation -From metals, to small organic molecules to nanoparticles -Pharmaceuticals, pesticides, foods, cosmetics, fine chemicals -Bringing together the views of industry, regulators, academia, NGOs
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信