An alternative to Cox’s regression for multiple survival curves comparison: A random forest-based approach using covariate structure

2021 International Conference on Computing, Computational Modelling and Applications (ICCMA) Pub Date : 2021-07-01 DOI:10.1109/ICCMA53594.2021.00029

Lubomír Štěpánek, Filip Habarta, I. Malá, L. Marek

{"title":"An alternative to Cox’s regression for multiple survival curves comparison: A random forest-based approach using covariate structure","authors":"Lubomír Štěpánek, Filip Habarta, I. Malá, L. Marek","doi":"10.1109/ICCMA53594.2021.00029","DOIUrl":null,"url":null,"abstract":"There are several established methods for comparing more than two survival curves, namely the scale-rank test or Cox’s proportional hazard model. However, when their statistical assumptions are not met, their results’ validity is affected.In this study, we address the mentioned issue and propose a new statistical approach on how to compare more than two survival curves using a random forest algorithm, which is practically assumption-free. The repetitive generating of many decision trees covered by one random forest model enables to calculate of a proportion of trees with sufficient complexity classifying into all groups (depicted by their survival curves), which is the p-value estimate as an analogy of the classical Wald’s t-test output of the Cox’s regression. Furthermore, a level of the pruning of decision trees the random forest model is built with, can modify both the robustness and statistical power of the random forest alternative. The discussed results are confirmed using COVID-19 survival data with varying the tree pruning level.The introduced method for survival curves comparison, based on random forest algorithm, seems to be a valid alternative to Cox’s regression; however, it has no statistical assumptions and tends to reach higher statistical power.","PeriodicalId":131082,"journal":{"name":"2021 International Conference on Computing, Computational Modelling and Applications (ICCMA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Computing, Computational Modelling and Applications (ICCMA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCMA53594.2021.00029","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

There are several established methods for comparing more than two survival curves, namely the scale-rank test or Cox’s proportional hazard model. However, when their statistical assumptions are not met, their results’ validity is affected.In this study, we address the mentioned issue and propose a new statistical approach on how to compare more than two survival curves using a random forest algorithm, which is practically assumption-free. The repetitive generating of many decision trees covered by one random forest model enables to calculate of a proportion of trees with sufficient complexity classifying into all groups (depicted by their survival curves), which is the p-value estimate as an analogy of the classical Wald’s t-test output of the Cox’s regression. Furthermore, a level of the pruning of decision trees the random forest model is built with, can modify both the robustness and statistical power of the random forest alternative. The discussed results are confirmed using COVID-19 survival data with varying the tree pruning level.The introduced method for survival curves comparison, based on random forest algorithm, seems to be a valid alternative to Cox’s regression; however, it has no statistical assumptions and tends to reach higher statistical power.

查看原文本刊更多论文

多生存曲线比较的Cox回归的替代方法:使用协变量结构的随机森林方法

有几种既定的方法可以比较两条以上的生存曲线，即scale-rank检验或Cox比例风险模型。然而，当他们的统计假设不满足时，他们的结果的有效性受到影响。在本研究中，我们解决了上述问题，并提出了一种新的统计方法，如何使用随机森林算法比较两条以上的生存曲线，这实际上是无假设的。由一个随机森林模型覆盖的许多决策树的重复生成使得能够计算出具有足够复杂性的树的比例，将其分类为所有组(由其生存曲线描述)，这是p值估计，类似于Cox回归的经典Wald 's t检验输出。此外，建立随机森林模型的决策树的修剪水平可以修改随机森林方案的鲁棒性和统计能力。利用不同树木修剪水平的COVID-19存活数据证实了所讨论的结果。所介绍的基于随机森林算法的生存曲线比较方法似乎是Cox回归的有效替代方法;然而，它没有统计假设，往往达到更高的统计能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 International Conference on Computing, Computational Modelling and Applications (ICCMA)

自引率

0.00%

发文量