Kyle Masato Ishikawa , Deborah Taira , Joseph Keaweʻaimoku Kaholokula , Matthew Uechi , James Davis , Eunjung Lim
{"title":"在健康和退休研究中,正则化回归优于树预测认知功能","authors":"Kyle Masato Ishikawa , Deborah Taira , Joseph Keaweʻaimoku Kaholokula , Matthew Uechi , James Davis , Eunjung Lim","doi":"10.1016/j.mlwa.2025.100694","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>Generalized linear models have been favored in healthcare research due to their interpretability. In contrast, tree-based models, such as random forest or boosted trees, are often preferred in machine learning (ML) and commercial settings due to their strong predictive performance. However, for clinical applications, model interpretability remains essential for actionable results and patient understanding. This study used ML to detect cognitive decline for the purpose of timely screening and uncovering associations with psychosocial determinants. All models were interpreted to enhance transparency and understanding of their predictions.</div></div><div><h3>Methods</h3><div>Data from the 2018 to 2020 Health and Retirement Study was used to create three linear regression models and three tree-based models. Ten percent of the sample was withheld for estimating performance, and model tuning used five-fold cross validation with two repeats. Survey frequency weights were applied during tuning, training, and final evaluation. Model performance was evaluated using RMSE and R<sup>2</sup> and interpretability was assessed via coefficients, variable importance, and decision trees.</div></div><div><h3>Results</h3><div>The elastic net model had the best performance (RMSE = 3.520, R<sup>2</sup> = 0.435), followed by standard linear regression, boosted trees, random forest, multivariate adaptive regression splines, and lastly, decision trees. Across all models, baseline cognitive function and frequency of computer use were the most influential predictors.</div></div><div><h3>Conclusion</h3><div>Elastic net regression outperformed tree-based models, suggesting that cognitive outcomes may be best modeled with additive linear relationships. Its ability to remove correlated and weak predictors contributed to its balance of interpretability and predictive performance for this particular dataset.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"21 ","pages":"Article 100694"},"PeriodicalIF":4.9000,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Regularized regression outperforms trees for predicting cognitive function in the Health and Retirement Study\",\"authors\":\"Kyle Masato Ishikawa , Deborah Taira , Joseph Keaweʻaimoku Kaholokula , Matthew Uechi , James Davis , Eunjung Lim\",\"doi\":\"10.1016/j.mlwa.2025.100694\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Background</h3><div>Generalized linear models have been favored in healthcare research due to their interpretability. In contrast, tree-based models, such as random forest or boosted trees, are often preferred in machine learning (ML) and commercial settings due to their strong predictive performance. However, for clinical applications, model interpretability remains essential for actionable results and patient understanding. This study used ML to detect cognitive decline for the purpose of timely screening and uncovering associations with psychosocial determinants. All models were interpreted to enhance transparency and understanding of their predictions.</div></div><div><h3>Methods</h3><div>Data from the 2018 to 2020 Health and Retirement Study was used to create three linear regression models and three tree-based models. Ten percent of the sample was withheld for estimating performance, and model tuning used five-fold cross validation with two repeats. Survey frequency weights were applied during tuning, training, and final evaluation. Model performance was evaluated using RMSE and R<sup>2</sup> and interpretability was assessed via coefficients, variable importance, and decision trees.</div></div><div><h3>Results</h3><div>The elastic net model had the best performance (RMSE = 3.520, R<sup>2</sup> = 0.435), followed by standard linear regression, boosted trees, random forest, multivariate adaptive regression splines, and lastly, decision trees. Across all models, baseline cognitive function and frequency of computer use were the most influential predictors.</div></div><div><h3>Conclusion</h3><div>Elastic net regression outperformed tree-based models, suggesting that cognitive outcomes may be best modeled with additive linear relationships. Its ability to remove correlated and weak predictors contributed to its balance of interpretability and predictive performance for this particular dataset.</div></div>\",\"PeriodicalId\":74093,\"journal\":{\"name\":\"Machine learning with applications\",\"volume\":\"21 \",\"pages\":\"Article 100694\"},\"PeriodicalIF\":4.9000,\"publicationDate\":\"2025-07-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Machine learning with applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2666827025000775\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine learning with applications","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666827025000775","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Regularized regression outperforms trees for predicting cognitive function in the Health and Retirement Study
Background
Generalized linear models have been favored in healthcare research due to their interpretability. In contrast, tree-based models, such as random forest or boosted trees, are often preferred in machine learning (ML) and commercial settings due to their strong predictive performance. However, for clinical applications, model interpretability remains essential for actionable results and patient understanding. This study used ML to detect cognitive decline for the purpose of timely screening and uncovering associations with psychosocial determinants. All models were interpreted to enhance transparency and understanding of their predictions.
Methods
Data from the 2018 to 2020 Health and Retirement Study was used to create three linear regression models and three tree-based models. Ten percent of the sample was withheld for estimating performance, and model tuning used five-fold cross validation with two repeats. Survey frequency weights were applied during tuning, training, and final evaluation. Model performance was evaluated using RMSE and R2 and interpretability was assessed via coefficients, variable importance, and decision trees.
Results
The elastic net model had the best performance (RMSE = 3.520, R2 = 0.435), followed by standard linear regression, boosted trees, random forest, multivariate adaptive regression splines, and lastly, decision trees. Across all models, baseline cognitive function and frequency of computer use were the most influential predictors.
Conclusion
Elastic net regression outperformed tree-based models, suggesting that cognitive outcomes may be best modeled with additive linear relationships. Its ability to remove correlated and weak predictors contributed to its balance of interpretability and predictive performance for this particular dataset.