Lotta M Meijerink, Ewoud Schuit, Karel G M Moons, Artuur M Leeuwenberg
{"title":"即插即用基于树的方法:对临床预测建模的影响。","authors":"Lotta M Meijerink, Ewoud Schuit, Karel G M Moons, Artuur M Leeuwenberg","doi":"10.1016/j.jclinepi.2025.111834","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>Tree-based models such as Random Forest and XGBoost are increasingly being used for clinical prediction, but certain aspects of their behavior are often overlooked. This article aims to illustrate these aspects and discuss the implications of plug-and-play use of tree-based models for clinical prediction. We focus on their ability to learn smooth, monotonic (i.e., consistent predictor effect where an increase in predictor leads to an increase in predicted risk), and additive predictor-outcome associations (i.e., each predictor independently and additively contributes to the outcome), and how they behave when making predictions outside the range of observed data (extrapolation).</p><p><strong>Study design and setting: </strong>We illustrated the behavior of plug-and-play use of tree-based models in a simulation study where we sampled predictors from standard normal distributions and binary outcomes determined by the logistic function of the predictors, and translate this into potential clinical implications in a real-world clinical example of post-radiotherapy toxicity prediction setting. To show the generalizability of our findings we also assessed the model's behavior in a publicly available dataset of head and neck cancer patients. For each analysis we visualized the learned predictor-outcome associations across different sample sizes.</p><p><strong>Results: </strong>In the simulation study, the models show stepwise fluctuations in their learned continuous predictor-outcome associations, caused by the inherent categorization of continuous predictors in a decision tree. Even with a large data size, the associations were not smooth or monotonic. Furthermore, because tree-based models can only split orthogonally to the axes, they struggle to learn an additive effect. Additionally, tree-based models extrapolate in a somewhat unintuitive way, by predicting a constant value beyond the observed data, regardless of further increases in predictor values. Using the clinical example and case study, we highlight that the learned associations are biologically implausible and may lead to issues regarding generalizability and trustworthiness.</p><p><strong>Conclusion: </strong>Using tree-based models in a plug-and-play manner for clinical prediction may result in undesirable predictor-outcome associations. Therefore, we recommend carefully taking their behavior into account during modeling decisions and evaluations. Further research is needed to explore the potential value of recent developments in decision tree literature, such as using constraints to incorporate prior knowledge and using soft split decision trees.</p>","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":" ","pages":"111834"},"PeriodicalIF":7.3000,"publicationDate":"2025-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Plug-and-play use of tree-based methods: Consequences for clinical prediction modelling.\",\"authors\":\"Lotta M Meijerink, Ewoud Schuit, Karel G M Moons, Artuur M Leeuwenberg\",\"doi\":\"10.1016/j.jclinepi.2025.111834\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Objective: </strong>Tree-based models such as Random Forest and XGBoost are increasingly being used for clinical prediction, but certain aspects of their behavior are often overlooked. This article aims to illustrate these aspects and discuss the implications of plug-and-play use of tree-based models for clinical prediction. We focus on their ability to learn smooth, monotonic (i.e., consistent predictor effect where an increase in predictor leads to an increase in predicted risk), and additive predictor-outcome associations (i.e., each predictor independently and additively contributes to the outcome), and how they behave when making predictions outside the range of observed data (extrapolation).</p><p><strong>Study design and setting: </strong>We illustrated the behavior of plug-and-play use of tree-based models in a simulation study where we sampled predictors from standard normal distributions and binary outcomes determined by the logistic function of the predictors, and translate this into potential clinical implications in a real-world clinical example of post-radiotherapy toxicity prediction setting. To show the generalizability of our findings we also assessed the model's behavior in a publicly available dataset of head and neck cancer patients. For each analysis we visualized the learned predictor-outcome associations across different sample sizes.</p><p><strong>Results: </strong>In the simulation study, the models show stepwise fluctuations in their learned continuous predictor-outcome associations, caused by the inherent categorization of continuous predictors in a decision tree. Even with a large data size, the associations were not smooth or monotonic. Furthermore, because tree-based models can only split orthogonally to the axes, they struggle to learn an additive effect. Additionally, tree-based models extrapolate in a somewhat unintuitive way, by predicting a constant value beyond the observed data, regardless of further increases in predictor values. Using the clinical example and case study, we highlight that the learned associations are biologically implausible and may lead to issues regarding generalizability and trustworthiness.</p><p><strong>Conclusion: </strong>Using tree-based models in a plug-and-play manner for clinical prediction may result in undesirable predictor-outcome associations. Therefore, we recommend carefully taking their behavior into account during modeling decisions and evaluations. Further research is needed to explore the potential value of recent developments in decision tree literature, such as using constraints to incorporate prior knowledge and using soft split decision trees.</p>\",\"PeriodicalId\":51079,\"journal\":{\"name\":\"Journal of Clinical Epidemiology\",\"volume\":\" \",\"pages\":\"111834\"},\"PeriodicalIF\":7.3000,\"publicationDate\":\"2025-05-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Clinical Epidemiology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1016/j.jclinepi.2025.111834\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"HEALTH CARE SCIENCES & SERVICES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Clinical Epidemiology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.jclinepi.2025.111834","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
Plug-and-play use of tree-based methods: Consequences for clinical prediction modelling.
Objective: Tree-based models such as Random Forest and XGBoost are increasingly being used for clinical prediction, but certain aspects of their behavior are often overlooked. This article aims to illustrate these aspects and discuss the implications of plug-and-play use of tree-based models for clinical prediction. We focus on their ability to learn smooth, monotonic (i.e., consistent predictor effect where an increase in predictor leads to an increase in predicted risk), and additive predictor-outcome associations (i.e., each predictor independently and additively contributes to the outcome), and how they behave when making predictions outside the range of observed data (extrapolation).
Study design and setting: We illustrated the behavior of plug-and-play use of tree-based models in a simulation study where we sampled predictors from standard normal distributions and binary outcomes determined by the logistic function of the predictors, and translate this into potential clinical implications in a real-world clinical example of post-radiotherapy toxicity prediction setting. To show the generalizability of our findings we also assessed the model's behavior in a publicly available dataset of head and neck cancer patients. For each analysis we visualized the learned predictor-outcome associations across different sample sizes.
Results: In the simulation study, the models show stepwise fluctuations in their learned continuous predictor-outcome associations, caused by the inherent categorization of continuous predictors in a decision tree. Even with a large data size, the associations were not smooth or monotonic. Furthermore, because tree-based models can only split orthogonally to the axes, they struggle to learn an additive effect. Additionally, tree-based models extrapolate in a somewhat unintuitive way, by predicting a constant value beyond the observed data, regardless of further increases in predictor values. Using the clinical example and case study, we highlight that the learned associations are biologically implausible and may lead to issues regarding generalizability and trustworthiness.
Conclusion: Using tree-based models in a plug-and-play manner for clinical prediction may result in undesirable predictor-outcome associations. Therefore, we recommend carefully taking their behavior into account during modeling decisions and evaluations. Further research is needed to explore the potential value of recent developments in decision tree literature, such as using constraints to incorporate prior knowledge and using soft split decision trees.
期刊介绍:
The Journal of Clinical Epidemiology strives to enhance the quality of clinical and patient-oriented healthcare research by advancing and applying innovative methods in conducting, presenting, synthesizing, disseminating, and translating research results into optimal clinical practice. Special emphasis is placed on training new generations of scientists and clinical practice leaders.