Matthew Berkowitz, Rachel MacKay Altman, Thomas M. Loughin
{"title":"生存数据的随机森林:哪些方法在哪些条件下最有效?","authors":"Matthew Berkowitz, Rachel MacKay Altman, Thomas M. Loughin","doi":"10.1515/ijb-2023-0056","DOIUrl":null,"url":null,"abstract":"Few systematic comparisons of methods for constructing survival trees and forests exist in the literature. Importantly, when the goal is to predict a survival time or estimate a survival function, the optimal choice of method is unclear. We use an extensive simulation study to systematically investigate various factors that influence survival forest performance – forest construction method, censoring, sample size, distribution of the response, structure of the linear predictor, and presence of correlated or noisy covariates. In particular, we study 11 methods that have recently been proposed in the literature and identify 6 top performers. We find that all the factors that we investigate have significant impact on the methods’ relative accuracy of point predictions of survival times and survival function estimates. We use our results to make recommendations for which methods to use in a given context and offer explanations for the observed differences in relative performance.","PeriodicalId":49058,"journal":{"name":"International Journal of Biostatistics","volume":"8 1","pages":""},"PeriodicalIF":1.0000,"publicationDate":"2024-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Random forests for survival data: which methods work best and under what conditions?\",\"authors\":\"Matthew Berkowitz, Rachel MacKay Altman, Thomas M. Loughin\",\"doi\":\"10.1515/ijb-2023-0056\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Few systematic comparisons of methods for constructing survival trees and forests exist in the literature. Importantly, when the goal is to predict a survival time or estimate a survival function, the optimal choice of method is unclear. We use an extensive simulation study to systematically investigate various factors that influence survival forest performance – forest construction method, censoring, sample size, distribution of the response, structure of the linear predictor, and presence of correlated or noisy covariates. In particular, we study 11 methods that have recently been proposed in the literature and identify 6 top performers. We find that all the factors that we investigate have significant impact on the methods’ relative accuracy of point predictions of survival times and survival function estimates. We use our results to make recommendations for which methods to use in a given context and offer explanations for the observed differences in relative performance.\",\"PeriodicalId\":49058,\"journal\":{\"name\":\"International Journal of Biostatistics\",\"volume\":\"8 1\",\"pages\":\"\"},\"PeriodicalIF\":1.0000,\"publicationDate\":\"2024-04-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Biostatistics\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://doi.org/10.1515/ijb-2023-0056\",\"RegionNum\":4,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"MATHEMATICAL & COMPUTATIONAL BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Biostatistics","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1515/ijb-2023-0056","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
Random forests for survival data: which methods work best and under what conditions?
Few systematic comparisons of methods for constructing survival trees and forests exist in the literature. Importantly, when the goal is to predict a survival time or estimate a survival function, the optimal choice of method is unclear. We use an extensive simulation study to systematically investigate various factors that influence survival forest performance – forest construction method, censoring, sample size, distribution of the response, structure of the linear predictor, and presence of correlated or noisy covariates. In particular, we study 11 methods that have recently been proposed in the literature and identify 6 top performers. We find that all the factors that we investigate have significant impact on the methods’ relative accuracy of point predictions of survival times and survival function estimates. We use our results to make recommendations for which methods to use in a given context and offer explanations for the observed differences in relative performance.
期刊介绍:
The International Journal of Biostatistics (IJB) seeks to publish new biostatistical models and methods, new statistical theory, as well as original applications of statistical methods, for important practical problems arising from the biological, medical, public health, and agricultural sciences with an emphasis on semiparametric methods. Given many alternatives to publish exist within biostatistics, IJB offers a place to publish for research in biostatistics focusing on modern methods, often based on machine-learning and other data-adaptive methodologies, as well as providing a unique reading experience that compels the author to be explicit about the statistical inference problem addressed by the paper. IJB is intended that the journal cover the entire range of biostatistics, from theoretical advances to relevant and sensible translations of a practical problem into a statistical framework. Electronic publication also allows for data and software code to be appended, and opens the door for reproducible research allowing readers to easily replicate analyses described in a paper. Both original research and review articles will be warmly received, as will articles applying sound statistical methods to practical problems.