Lubomír Štěpánek, Filip Habarta, I. Malá, L. Marek
{"title":"An alternative to Cox’s regression for multiple survival curves comparison: A random forest-based approach using covariate structure","authors":"Lubomír Štěpánek, Filip Habarta, I. Malá, L. Marek","doi":"10.1109/ICCMA53594.2021.00029","DOIUrl":null,"url":null,"abstract":"There are several established methods for comparing more than two survival curves, namely the scale-rank test or Cox’s proportional hazard model. However, when their statistical assumptions are not met, their results’ validity is affected.In this study, we address the mentioned issue and propose a new statistical approach on how to compare more than two survival curves using a random forest algorithm, which is practically assumption-free. The repetitive generating of many decision trees covered by one random forest model enables to calculate of a proportion of trees with sufficient complexity classifying into all groups (depicted by their survival curves), which is the p-value estimate as an analogy of the classical Wald’s t-test output of the Cox’s regression. Furthermore, a level of the pruning of decision trees the random forest model is built with, can modify both the robustness and statistical power of the random forest alternative. The discussed results are confirmed using COVID-19 survival data with varying the tree pruning level.The introduced method for survival curves comparison, based on random forest algorithm, seems to be a valid alternative to Cox’s regression; however, it has no statistical assumptions and tends to reach higher statistical power.","PeriodicalId":131082,"journal":{"name":"2021 International Conference on Computing, Computational Modelling and Applications (ICCMA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Computing, Computational Modelling and Applications (ICCMA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCMA53594.2021.00029","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
There are several established methods for comparing more than two survival curves, namely the scale-rank test or Cox’s proportional hazard model. However, when their statistical assumptions are not met, their results’ validity is affected.In this study, we address the mentioned issue and propose a new statistical approach on how to compare more than two survival curves using a random forest algorithm, which is practically assumption-free. The repetitive generating of many decision trees covered by one random forest model enables to calculate of a proportion of trees with sufficient complexity classifying into all groups (depicted by their survival curves), which is the p-value estimate as an analogy of the classical Wald’s t-test output of the Cox’s regression. Furthermore, a level of the pruning of decision trees the random forest model is built with, can modify both the robustness and statistical power of the random forest alternative. The discussed results are confirmed using COVID-19 survival data with varying the tree pruning level.The introduced method for survival curves comparison, based on random forest algorithm, seems to be a valid alternative to Cox’s regression; however, it has no statistical assumptions and tends to reach higher statistical power.