{"title":"在预测模型中纳入纵向可变性:机器学习和逻辑回归在长期随访队列研究中的比较。","authors":"L M de Groot, J W R Twisk, A A L Kok, M W Heymans","doi":"10.1016/j.annepidem.2025.07.060","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>Clinical prediction models benefit from longitudinal data. While the predictive value of a predictor's mean and change over time is well-established, the role of variability around this change is underexplored. Machine Learning methods can be effective in analyzing longitudinal data with long follow-up periods. This study evaluated the predictive value of mean, change, and variability, comparing Random Forest, Lasso regression, and logistic regression.</p><p><strong>Methods: </strong>We compared models including only mean and change to models also incorporating variability. Predictor selection, interpretability, and performance were compared across methods. Performance was assessed using AUC, sensitivity, specificity, PPV, NPV, and calibration. Data were drawn from the Longitudinal Aging Study Amsterdam to predict depression using 81 longitudinal parameters. Models were trained on 70 % and validated on 30 % of the data. To ensure robustness, analyses were repeated over 500 random splits, and aggregated results were reported.</p><p><strong>Results: </strong>Including variability improved AUCs for all methods. Predictor selection overlapped across models, and regression coefficients aligned with Random Forest partial dependence plots. Lasso showed the highest training AUC but poorer test performance, while logistic regression and Random Forest showed more stable results. Calibration was acceptable, though predicted risks remained below 0.6.</p><p><strong>Conclusion: </strong>Machine Learning methods did not outperform logistic regression. Nonetheless, incorporating variability in longitudinal predictors enhances prediction, especially with expected changes in predictors, e.g., ageing populations.</p>","PeriodicalId":50767,"journal":{"name":"Annals of Epidemiology","volume":" ","pages":"51-65"},"PeriodicalIF":3.0000,"publicationDate":"2025-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Incorporating longitudinal variability in prediction models: A comparison of machine learning and logistic regression in a cohort study with long follow-up.\",\"authors\":\"L M de Groot, J W R Twisk, A A L Kok, M W Heymans\",\"doi\":\"10.1016/j.annepidem.2025.07.060\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Purpose: </strong>Clinical prediction models benefit from longitudinal data. While the predictive value of a predictor's mean and change over time is well-established, the role of variability around this change is underexplored. Machine Learning methods can be effective in analyzing longitudinal data with long follow-up periods. This study evaluated the predictive value of mean, change, and variability, comparing Random Forest, Lasso regression, and logistic regression.</p><p><strong>Methods: </strong>We compared models including only mean and change to models also incorporating variability. Predictor selection, interpretability, and performance were compared across methods. Performance was assessed using AUC, sensitivity, specificity, PPV, NPV, and calibration. Data were drawn from the Longitudinal Aging Study Amsterdam to predict depression using 81 longitudinal parameters. Models were trained on 70 % and validated on 30 % of the data. To ensure robustness, analyses were repeated over 500 random splits, and aggregated results were reported.</p><p><strong>Results: </strong>Including variability improved AUCs for all methods. Predictor selection overlapped across models, and regression coefficients aligned with Random Forest partial dependence plots. Lasso showed the highest training AUC but poorer test performance, while logistic regression and Random Forest showed more stable results. Calibration was acceptable, though predicted risks remained below 0.6.</p><p><strong>Conclusion: </strong>Machine Learning methods did not outperform logistic regression. Nonetheless, incorporating variability in longitudinal predictors enhances prediction, especially with expected changes in predictors, e.g., ageing populations.</p>\",\"PeriodicalId\":50767,\"journal\":{\"name\":\"Annals of Epidemiology\",\"volume\":\" \",\"pages\":\"51-65\"},\"PeriodicalIF\":3.0000,\"publicationDate\":\"2025-07-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Annals of Epidemiology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1016/j.annepidem.2025.07.060\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annals of Epidemiology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.annepidem.2025.07.060","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH","Score":null,"Total":0}
Incorporating longitudinal variability in prediction models: A comparison of machine learning and logistic regression in a cohort study with long follow-up.
Purpose: Clinical prediction models benefit from longitudinal data. While the predictive value of a predictor's mean and change over time is well-established, the role of variability around this change is underexplored. Machine Learning methods can be effective in analyzing longitudinal data with long follow-up periods. This study evaluated the predictive value of mean, change, and variability, comparing Random Forest, Lasso regression, and logistic regression.
Methods: We compared models including only mean and change to models also incorporating variability. Predictor selection, interpretability, and performance were compared across methods. Performance was assessed using AUC, sensitivity, specificity, PPV, NPV, and calibration. Data were drawn from the Longitudinal Aging Study Amsterdam to predict depression using 81 longitudinal parameters. Models were trained on 70 % and validated on 30 % of the data. To ensure robustness, analyses were repeated over 500 random splits, and aggregated results were reported.
Results: Including variability improved AUCs for all methods. Predictor selection overlapped across models, and regression coefficients aligned with Random Forest partial dependence plots. Lasso showed the highest training AUC but poorer test performance, while logistic regression and Random Forest showed more stable results. Calibration was acceptable, though predicted risks remained below 0.6.
Conclusion: Machine Learning methods did not outperform logistic regression. Nonetheless, incorporating variability in longitudinal predictors enhances prediction, especially with expected changes in predictors, e.g., ageing populations.
期刊介绍:
The journal emphasizes the application of epidemiologic methods to issues that affect the distribution and determinants of human illness in diverse contexts. Its primary focus is on chronic and acute conditions of diverse etiologies and of major importance to clinical medicine, public health, and health care delivery.