Classification of hepatotoxicity of compounds based on cytotoxicity assays is improved by additional interpretable summaries of high-dimensional gene expression data

IF 3.1 Q2 TOXICOLOGY

Computational Toxicology Pub Date : 2023-09-15 DOI:10.1016/j.comtox.2023.100288

Marieke Stolte , Wiebke Albrecht , Tim Brecklinghaus , Lisa Gründler , Peng Chen , Jan G. Hengstler , Franziska Kappenberg , Jörg Rahnenführer

{"title":"Classification of hepatotoxicity of compounds based on cytotoxicity assays is improved by additional interpretable summaries of high-dimensional gene expression data","authors":"Marieke Stolte , Wiebke Albrecht , Tim Brecklinghaus , Lisa Gründler , Peng Chen , Jan G. Hengstler , Franziska Kappenberg , Jörg Rahnenführer","doi":"10.1016/j.comtox.2023.100288","DOIUrl":null,"url":null,"abstract":"<div><p>Established cytotoxicity assays are commonly used for assessing the hepatotoxic risk of a compound. The addition of gene expression measurements from high-dimensional RNAseq experiments offers the potential for improved classification. However, it is generally not clear how best to summarize the high-dimensional gene measurements into meaningful variables. We propose several intuitive methods for dimension reduction of gene expression measurements toward interpretable variables and explore their relevance in predicting hepatotoxicity, using a dataset with 60 compounds.</p><p>Different advanced statistical learning algorithms are evaluated as classification methods and their performances are compared on the dataset. The best predictions are achieved by tree-based methods such as random forest and xgboost, and tuning the parameters of the algorithm helps to improve the classification accuracy. It is shown that the simultaneous use of data from cytotoxicity assays and from gene expression variables summarized in different ways has a synergistic effect and leads to a better prediction of hepatotoxicity than both sets of variables individually. Further, when gene expression data are summarized, different strategies for the generation of interpretable variables contribute to the overall improved prediction quality. When considering cytotoxicity assays alone, the best classification method yields a mean accuracy of 0.757, while the same classification method and an optimal choice of variables yields a mean accuracy of 0.811. The overall best value for the mean accuracy is 0.821.</p></div>","PeriodicalId":37651,"journal":{"name":"Computational Toxicology","volume":"28 ","pages":"Article 100288"},"PeriodicalIF":3.1000,"publicationDate":"2023-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Toxicology","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2468111323000294","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"TOXICOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Established cytotoxicity assays are commonly used for assessing the hepatotoxic risk of a compound. The addition of gene expression measurements from high-dimensional RNAseq experiments offers the potential for improved classification. However, it is generally not clear how best to summarize the high-dimensional gene measurements into meaningful variables. We propose several intuitive methods for dimension reduction of gene expression measurements toward interpretable variables and explore their relevance in predicting hepatotoxicity, using a dataset with 60 compounds.

Different advanced statistical learning algorithms are evaluated as classification methods and their performances are compared on the dataset. The best predictions are achieved by tree-based methods such as random forest and xgboost, and tuning the parameters of the algorithm helps to improve the classification accuracy. It is shown that the simultaneous use of data from cytotoxicity assays and from gene expression variables summarized in different ways has a synergistic effect and leads to a better prediction of hepatotoxicity than both sets of variables individually. Further, when gene expression data are summarized, different strategies for the generation of interpretable variables contribute to the overall improved prediction quality. When considering cytotoxicity assays alone, the best classification method yields a mean accuracy of 0.757, while the same classification method and an optimal choice of variables yields a mean accuracy of 0.811. The overall best value for the mean accuracy is 0.821.

查看原文本刊更多论文

通过对高维基因表达数据的额外可解释总结，改进了基于细胞毒性测定的化合物肝毒性分类

已建立的细胞毒性测定法通常用于评估化合物的肝毒性风险。添加来自高维RNAseq实验的基因表达测量提供了改进分类的潜力。然而，通常不清楚如何最好地将高维基因测量总结为有意义的变量。我们提出了几种针对可解释变量的基因表达测量降维的直观方法，并使用包含60种化合物的数据集探讨了它们在预测肝毒性中的相关性。评估了不同的高级统计学习算法作为分类方法，并在数据集上比较了它们的性能。最佳预测是通过基于树的方法（如随机森林和xgboost）实现的，调整算法的参数有助于提高分类精度。研究表明，同时使用细胞毒性测定和以不同方式总结的基因表达变量的数据具有协同效应，并比单独使用两组变量更好地预测肝毒性。此外，当总结基因表达数据时，产生可解释变量的不同策略有助于整体提高预测质量。当单独考虑细胞毒性测定时，最佳分类方法的平均准确度为0.757，而相同的分类方法和变量的最佳选择的平均准确率为0.811。平均精度的总体最佳值为0.821。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computational Toxicology Computer Science-Computer Science Applications

CiteScore

5.50

自引率

0.00%

发文量

审稿时长

56 days

期刊介绍： Computational Toxicology is an international journal publishing computational approaches that assist in the toxicological evaluation of new and existing chemical substances assisting in their safety assessment. -All effects relating to human health and environmental toxicity and fate -Prediction of toxicity, metabolism, fate and physico-chemical properties -The development of models from read-across, (Q)SARs, PBPK, QIVIVE, Multi-Scale Models -Big Data in toxicology: integration, management, analysis -Implementation of models through AOPs, IATA, TTC -Regulatory acceptance of models: evaluation, verification and validation -From metals, to small organic molecules to nanoparticles -Pharmaceuticals, pesticides, foods, cosmetics, fine chemicals -Bringing together the views of industry, regulators, academia, NGOs