即插即用基于树的方法：对临床预测建模的影响。

IF 7.3 2区医学 Q1 HEALTH CARE SCIENCES & SERVICES

Journal of Clinical Epidemiology Pub Date : 2025-05-19 DOI:10.1016/j.jclinepi.2025.111834

Lotta M Meijerink, Ewoud Schuit, Karel G M Moons, Artuur M Leeuwenberg

{"title":"即插即用基于树的方法：对临床预测建模的影响。","authors":"Lotta M Meijerink, Ewoud Schuit, Karel G M Moons, Artuur M Leeuwenberg","doi":"10.1016/j.jclinepi.2025.111834","DOIUrl":null,"url":null,"abstract":"Objective: Tree-based models such as Random Forest and XGBoost are increasingly being used for clinical prediction, but certain aspects of their behavior are often overlooked. This article aims to illustrate these aspects and discuss the implications of plug-and-play use of tree-based models for clinical prediction. We focus on their ability to learn smooth, monotonic (i.e., consistent predictor effect where an increase in predictor leads to an increase in predicted risk), and additive predictor-outcome associations (i.e., each predictor independently and additively contributes to the outcome), and how they behave when making predictions outside the range of observed data (extrapolation).Study design and setting: We illustrated the behavior of plug-and-play use of tree-based models in a simulation study where we sampled predictors from standard normal distributions and binary outcomes determined by the logistic function of the predictors, and translate this into potential clinical implications in a real-world clinical example of post-radiotherapy toxicity prediction setting. To show the generalizability of our findings we also assessed the model's behavior in a publicly available dataset of head and neck cancer patients. For each analysis we visualized the learned predictor-outcome associations across different sample sizes.Results: In the simulation study, the models show stepwise fluctuations in their learned continuous predictor-outcome associations, caused by the inherent categorization of continuous predictors in a decision tree. Even with a large data size, the associations were not smooth or monotonic. Furthermore, because tree-based models can only split orthogonally to the axes, they struggle to learn an additive effect. Additionally, tree-based models extrapolate in a somewhat unintuitive way, by predicting a constant value beyond the observed data, regardless of further increases in predictor values. Using the clinical example and case study, we highlight that the learned associations are biologically implausible and may lead to issues regarding generalizability and trustworthiness.Conclusion: Using tree-based models in a plug-and-play manner for clinical prediction may result in undesirable predictor-outcome associations. Therefore, we recommend carefully taking their behavior into account during modeling decisions and evaluations. Further research is needed to explore the potential value of recent developments in decision tree literature, such as using constraints to incorporate prior knowledge and using soft split decision trees.","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":" ","pages":"111834"},"PeriodicalIF":7.3000,"publicationDate":"2025-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Plug-and-play use of tree-based methods: Consequences for clinical prediction modelling.\",\"authors\":\"Lotta M Meijerink, Ewoud Schuit, Karel G M Moons, Artuur M Leeuwenberg\",\"doi\":\"10.1016/j.jclinepi.2025.111834\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Objective: Tree-based models such as Random Forest and XGBoost are increasingly being used for clinical prediction, but certain aspects of their behavior are often overlooked. This article aims to illustrate these aspects and discuss the implications of plug-and-play use of tree-based models for clinical prediction. We focus on their ability to learn smooth, monotonic (i.e., consistent predictor effect where an increase in predictor leads to an increase in predicted risk), and additive predictor-outcome associations (i.e., each predictor independently and additively contributes to the outcome), and how they behave when making predictions outside the range of observed data (extrapolation).Study design and setting: We illustrated the behavior of plug-and-play use of tree-based models in a simulation study where we sampled predictors from standard normal distributions and binary outcomes determined by the logistic function of the predictors, and translate this into potential clinical implications in a real-world clinical example of post-radiotherapy toxicity prediction setting. To show the generalizability of our findings we also assessed the model's behavior in a publicly available dataset of head and neck cancer patients. For each analysis we visualized the learned predictor-outcome associations across different sample sizes.Results: In the simulation study, the models show stepwise fluctuations in their learned continuous predictor-outcome associations, caused by the inherent categorization of continuous predictors in a decision tree. Even with a large data size, the associations were not smooth or monotonic. Furthermore, because tree-based models can only split orthogonally to the axes, they struggle to learn an additive effect. Additionally, tree-based models extrapolate in a somewhat unintuitive way, by predicting a constant value beyond the observed data, regardless of further increases in predictor values. Using the clinical example and case study, we highlight that the learned associations are biologically implausible and may lead to issues regarding generalizability and trustworthiness.Conclusion: Using tree-based models in a plug-and-play manner for clinical prediction may result in undesirable predictor-outcome associations. Therefore, we recommend carefully taking their behavior into account during modeling decisions and evaluations. Further research is needed to explore the potential value of recent developments in decision tree literature, such as using constraints to incorporate prior knowledge and using soft split decision trees.\",\"PeriodicalId\":51079,\"journal\":{\"name\":\"Journal of Clinical Epidemiology\",\"volume\":\" \",\"pages\":\"111834\"},\"PeriodicalIF\":7.3000,\"publicationDate\":\"2025-05-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Clinical Epidemiology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1016/j.jclinepi.2025.111834\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"HEALTH CARE SCIENCES & SERVICES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Clinical Epidemiology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.jclinepi.2025.111834","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}

引用次数: 0

摘要

目的：随机森林和XGBoost等基于树的模型越来越多地用于临床预测，但其行为的某些方面往往被忽视。本文旨在说明这些方面，并讨论即插即用基于树的模型用于临床预测的含义。我们关注的是它们平滑、单调（即一致的预测效应，预测因子的增加导致预测风险的增加）和可加性预测-结果关联（即每个预测因子独立且可加性地影响结果）的学习能力，以及它们在做出观察数据范围之外的预测时的行为（外推）。研究设计和设置：我们在模拟研究中说明了基于树的模型即插即用的行为，我们从标准正态分布和由预测因子的逻辑函数确定的二元结果中抽样预测因子，并将其转化为放疗后毒性预测设置的现实世界临床示例中的潜在临床意义。为了证明我们的发现的普遍性，我们还在公开的头颈癌患者数据集中评估了该模型的行为。对于每个分析，我们将不同样本量的学习预测结果关联可视化。结果：在模拟研究中，由于决策树中连续预测因子的固有分类，模型在其学习到的连续预测因子-结果关联中显示出逐步波动。即使数据量很大，这种关联也不是平滑的或单调的。此外，由于基于树的模型只能与轴垂直分割，它们很难学习加法效应。此外，基于树的模型以一种不太直观的方式进行外推，通过预测超出观测数据的恒定值，而不考虑预测值的进一步增加。通过临床案例和案例研究，我们强调，学习关联在生物学上是不可信的，可能导致有关普遍性和可信度的问题。结论：以即插即用的方式使用基于树的模型进行临床预测可能会导致不良的预测结果关联。因此，我们建议在建模决策和评估期间仔细考虑它们的行为。需要进一步的研究来探索决策树文献的最新发展的潜在价值，例如使用约束来合并先验知识和使用软分裂决策树。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Plug-and-play use of tree-based methods: Consequences for clinical prediction modelling.

Objective: Tree-based models such as Random Forest and XGBoost are increasingly being used for clinical prediction, but certain aspects of their behavior are often overlooked. This article aims to illustrate these aspects and discuss the implications of plug-and-play use of tree-based models for clinical prediction. We focus on their ability to learn smooth, monotonic (i.e., consistent predictor effect where an increase in predictor leads to an increase in predicted risk), and additive predictor-outcome associations (i.e., each predictor independently and additively contributes to the outcome), and how they behave when making predictions outside the range of observed data (extrapolation).

Study design and setting: We illustrated the behavior of plug-and-play use of tree-based models in a simulation study where we sampled predictors from standard normal distributions and binary outcomes determined by the logistic function of the predictors, and translate this into potential clinical implications in a real-world clinical example of post-radiotherapy toxicity prediction setting. To show the generalizability of our findings we also assessed the model's behavior in a publicly available dataset of head and neck cancer patients. For each analysis we visualized the learned predictor-outcome associations across different sample sizes.

Results: In the simulation study, the models show stepwise fluctuations in their learned continuous predictor-outcome associations, caused by the inherent categorization of continuous predictors in a decision tree. Even with a large data size, the associations were not smooth or monotonic. Furthermore, because tree-based models can only split orthogonally to the axes, they struggle to learn an additive effect. Additionally, tree-based models extrapolate in a somewhat unintuitive way, by predicting a constant value beyond the observed data, regardless of further increases in predictor values. Using the clinical example and case study, we highlight that the learned associations are biologically implausible and may lead to issues regarding generalizability and trustworthiness.

Conclusion: Using tree-based models in a plug-and-play manner for clinical prediction may result in undesirable predictor-outcome associations. Therefore, we recommend carefully taking their behavior into account during modeling decisions and evaluations. Further research is needed to explore the potential value of recent developments in decision tree literature, such as using constraints to incorporate prior knowledge and using soft split decision trees.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Clinical Epidemiology 医学-公共卫生、环境卫生与职业卫生

CiteScore

12.00

自引率

6.90%

发文量

320

审稿时长

44 days

期刊介绍： The Journal of Clinical Epidemiology strives to enhance the quality of clinical and patient-oriented healthcare research by advancing and applying innovative methods in conducting, presenting, synthesizing, disseminating, and translating research results into optimal clinical practice. Special emphasis is placed on training new generations of scientists and clinical practice leaders.