Classification of hepatotoxicity of compounds based on cytotoxicity assays is improved by additional interpretable summaries of high-dimensional gene expression data
Marieke Stolte , Wiebke Albrecht , Tim Brecklinghaus , Lisa Gründler , Peng Chen , Jan G. Hengstler , Franziska Kappenberg , Jörg Rahnenführer
{"title":"Classification of hepatotoxicity of compounds based on cytotoxicity assays is improved by additional interpretable summaries of high-dimensional gene expression data","authors":"Marieke Stolte , Wiebke Albrecht , Tim Brecklinghaus , Lisa Gründler , Peng Chen , Jan G. Hengstler , Franziska Kappenberg , Jörg Rahnenführer","doi":"10.1016/j.comtox.2023.100288","DOIUrl":null,"url":null,"abstract":"<div><p>Established cytotoxicity assays are commonly used for assessing the hepatotoxic risk of a compound. The addition of gene expression measurements from high-dimensional RNAseq experiments offers the potential for improved classification. However, it is generally not clear how best to summarize the high-dimensional gene measurements into meaningful variables. We propose several intuitive methods for dimension reduction of gene expression measurements toward interpretable variables and explore their relevance in predicting hepatotoxicity, using a dataset with 60 compounds.</p><p>Different advanced statistical learning algorithms are evaluated as classification methods and their performances are compared on the dataset. The best predictions are achieved by tree-based methods such as random forest and xgboost, and tuning the parameters of the algorithm helps to improve the classification accuracy. It is shown that the simultaneous use of data from cytotoxicity assays and from gene expression variables summarized in different ways has a synergistic effect and leads to a better prediction of hepatotoxicity than both sets of variables individually. Further, when gene expression data are summarized, different strategies for the generation of interpretable variables contribute to the overall improved prediction quality. When considering cytotoxicity assays alone, the best classification method yields a mean accuracy of 0.757, while the same classification method and an optimal choice of variables yields a mean accuracy of 0.811. The overall best value for the mean accuracy is 0.821.</p></div>","PeriodicalId":37651,"journal":{"name":"Computational Toxicology","volume":"28 ","pages":"Article 100288"},"PeriodicalIF":3.1000,"publicationDate":"2023-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Toxicology","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2468111323000294","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"TOXICOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Established cytotoxicity assays are commonly used for assessing the hepatotoxic risk of a compound. The addition of gene expression measurements from high-dimensional RNAseq experiments offers the potential for improved classification. However, it is generally not clear how best to summarize the high-dimensional gene measurements into meaningful variables. We propose several intuitive methods for dimension reduction of gene expression measurements toward interpretable variables and explore their relevance in predicting hepatotoxicity, using a dataset with 60 compounds.
Different advanced statistical learning algorithms are evaluated as classification methods and their performances are compared on the dataset. The best predictions are achieved by tree-based methods such as random forest and xgboost, and tuning the parameters of the algorithm helps to improve the classification accuracy. It is shown that the simultaneous use of data from cytotoxicity assays and from gene expression variables summarized in different ways has a synergistic effect and leads to a better prediction of hepatotoxicity than both sets of variables individually. Further, when gene expression data are summarized, different strategies for the generation of interpretable variables contribute to the overall improved prediction quality. When considering cytotoxicity assays alone, the best classification method yields a mean accuracy of 0.757, while the same classification method and an optimal choice of variables yields a mean accuracy of 0.811. The overall best value for the mean accuracy is 0.821.
期刊介绍:
Computational Toxicology is an international journal publishing computational approaches that assist in the toxicological evaluation of new and existing chemical substances assisting in their safety assessment. -All effects relating to human health and environmental toxicity and fate -Prediction of toxicity, metabolism, fate and physico-chemical properties -The development of models from read-across, (Q)SARs, PBPK, QIVIVE, Multi-Scale Models -Big Data in toxicology: integration, management, analysis -Implementation of models through AOPs, IATA, TTC -Regulatory acceptance of models: evaluation, verification and validation -From metals, to small organic molecules to nanoparticles -Pharmaceuticals, pesticides, foods, cosmetics, fine chemicals -Bringing together the views of industry, regulators, academia, NGOs