CCLR-DL：一种用于特征选择和预测医疗需求的新型统计和深度学习混合方法

IF 4.8 2区医学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Computer methods and programs in biomedicine Pub Date : 2025-09-07 DOI:10.1016/j.cmpb.2025.109057

Guillem Hernández Guillamet , Francesc López Seguí , Josep Vidal Alaball , Beatriz López

{"title":"CCLR-DL：一种用于特征选择和预测医疗需求的新型统计和深度学习混合方法","authors":"Guillem Hernández Guillamet , Francesc López Seguí , Josep Vidal Alaball , Beatriz López","doi":"10.1016/j.cmpb.2025.109057","DOIUrl":null,"url":null,"abstract":"<div><h3>Background and Objective:</h3><div>Hybrid forecasting methods aim to overcome the limitations of classical statistical approaches and deep learning models. While statistical methods provide interpretability, they often lack predictive power. Conversely, deep learning models achieve high accuracy but act as “black boxes.” This study introduces the Comprehensive Cross-Correlation and Lagged Linear Regression Deep Learning (CCLR-DL) framework, combining statistical and deep learning techniques to enhance both forecasting accuracy and interpretability. Unlike existing hybrid methods that combine statistical filtering with deep learning, CCLR-DL integrates causal statistical selection with neural forecasting, producing interpretable predictors and consistently achieving higher accuracy than models without feature selection or other standard baselines.</div></div><div><h3>Methods:</h3><div>The CCLR-DL framework integrates cross-correlation analysis, lagged multiple linear regression, and Granger causality testing with advanced deep learning architectures. This dual-phase approach first identifies causally significant predictors and then fits them into a deep learning model for multivariate time series forecasting. The framework was validated using a real-world dataset of clinical visits and diagnoses from 6.3 million individuals collected over 10 years.</div></div><div><h3>Results:</h3><div>In the evaluated setting, the CCLR-DL framework outperformed baseline models, achieving an average Root Mean Square Error (RMSE) improvement of 19.8% over univariate models, 60.1% over no feature selection, and 51.9% over random selection. The causality phase ensured that all selected predictors demonstrated a significant Granger-causal (GC) relationship. Simpler recurrent architectures, particularly bidirectional Long Short-Term Memory units (BiLSTM), yielded the most accurate forecasts by effectively capturing nonlinear temporal dependencies.</div></div><div><h3>Conclusions:</h3><div>By addressing the challenges of both prediction accuracy and model transparency, the CCLR-DL framework offers a new approach for high-dimensional, multivariate time series forecasting. In healthcare settings, it may enable decision-makers to anticipate demand shifts with greater reliability, allowing earlier staff scheduling, more efficient resource allocation, and reduced waiting times. In our evaluation, it consistently outperformed baseline strategies, delivering measurable improvements that translate into thousands of patient visits being forecasted more accurately across large populations.</div></div>","PeriodicalId":10624,"journal":{"name":"Computer methods and programs in biomedicine","volume":"272 ","pages":"Article 109057"},"PeriodicalIF":4.8000,"publicationDate":"2025-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"CCLR-DL: A novel statistics and deep learning hybrid method for feature selection and forecasting healthcare demand\",\"authors\":\"Guillem Hernández Guillamet , Francesc López Seguí , Josep Vidal Alaball , Beatriz López\",\"doi\":\"10.1016/j.cmpb.2025.109057\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Background and Objective:</h3><div>Hybrid forecasting methods aim to overcome the limitations of classical statistical approaches and deep learning models. While statistical methods provide interpretability, they often lack predictive power. Conversely, deep learning models achieve high accuracy but act as “black boxes.” This study introduces the Comprehensive Cross-Correlation and Lagged Linear Regression Deep Learning (CCLR-DL) framework, combining statistical and deep learning techniques to enhance both forecasting accuracy and interpretability. Unlike existing hybrid methods that combine statistical filtering with deep learning, CCLR-DL integrates causal statistical selection with neural forecasting, producing interpretable predictors and consistently achieving higher accuracy than models without feature selection or other standard baselines.</div></div><div><h3>Methods:</h3><div>The CCLR-DL framework integrates cross-correlation analysis, lagged multiple linear regression, and Granger causality testing with advanced deep learning architectures. This dual-phase approach first identifies causally significant predictors and then fits them into a deep learning model for multivariate time series forecasting. The framework was validated using a real-world dataset of clinical visits and diagnoses from 6.3 million individuals collected over 10 years.</div></div><div><h3>Results:</h3><div>In the evaluated setting, the CCLR-DL framework outperformed baseline models, achieving an average Root Mean Square Error (RMSE) improvement of 19.8% over univariate models, 60.1% over no feature selection, and 51.9% over random selection. The causality phase ensured that all selected predictors demonstrated a significant Granger-causal (GC) relationship. Simpler recurrent architectures, particularly bidirectional Long Short-Term Memory units (BiLSTM), yielded the most accurate forecasts by effectively capturing nonlinear temporal dependencies.</div></div><div><h3>Conclusions:</h3><div>By addressing the challenges of both prediction accuracy and model transparency, the CCLR-DL framework offers a new approach for high-dimensional, multivariate time series forecasting. In healthcare settings, it may enable decision-makers to anticipate demand shifts with greater reliability, allowing earlier staff scheduling, more efficient resource allocation, and reduced waiting times. In our evaluation, it consistently outperformed baseline strategies, delivering measurable improvements that translate into thousands of patient visits being forecasted more accurately across large populations.</div></div>\",\"PeriodicalId\":10624,\"journal\":{\"name\":\"Computer methods and programs in biomedicine\",\"volume\":\"272 \",\"pages\":\"Article 109057\"},\"PeriodicalIF\":4.8000,\"publicationDate\":\"2025-09-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer methods and programs in biomedicine\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0169260725004742\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer methods and programs in biomedicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169260725004742","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

摘要

背景与目的：混合预测方法旨在克服经典统计方法和深度学习模型的局限性。虽然统计方法提供了可解释性，但它们往往缺乏预测能力。相反，深度学习模型实现了很高的准确性，但充当了“黑盒子”。本研究引入了综合互相关和滞后线性回归深度学习（CCLR-DL）框架，结合统计和深度学习技术来提高预测的准确性和可解释性。与现有的将统计过滤与深度学习相结合的混合方法不同，CCLR-DL将因果统计选择与神经预测相结合，产生可解释的预测因子，并始终比没有特征选择或其他标准基线的模型获得更高的准确性。方法：CCLR-DL框架将相互关联分析、滞后多元线性回归和格兰杰因果关系检验与先进的深度学习架构相结合。这种双阶段方法首先确定因果关系显著的预测因子，然后将其拟合到用于多变量时间序列预测的深度学习模型中。该框架使用10年来收集的630万人的临床就诊和诊断数据集进行了验证。结果：在评估设置中，CCLR-DL框架优于基线模型，平均均方根误差（RMSE）比单变量模型提高19.8%，比无特征选择提高60.1%，比随机选择提高51.9%。因果关系阶段确保所有选择的预测因子都表现出显著的格兰杰因果关系（GC）。更简单的循环架构，特别是双向长短期记忆单元（BiLSTM），通过有效地捕获非线性时间依赖性，产生了最准确的预测。结论：通过解决预测精度和模型透明度的挑战，CCLR-DL框架为高维、多变量时间序列预测提供了一种新的方法。在医疗保健环境中，它可以使决策者更可靠地预测需求变化，从而实现更早的人员调度、更有效的资源分配和更短的等待时间。在我们的评估中，它始终优于基线策略，提供可衡量的改进，转化为在大量人群中更准确地预测成千上万的患者就诊。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

CCLR-DL: A novel statistics and deep learning hybrid method for feature selection and forecasting healthcare demand

Background and Objective:

Hybrid forecasting methods aim to overcome the limitations of classical statistical approaches and deep learning models. While statistical methods provide interpretability, they often lack predictive power. Conversely, deep learning models achieve high accuracy but act as “black boxes.” This study introduces the Comprehensive Cross-Correlation and Lagged Linear Regression Deep Learning (CCLR-DL) framework, combining statistical and deep learning techniques to enhance both forecasting accuracy and interpretability. Unlike existing hybrid methods that combine statistical filtering with deep learning, CCLR-DL integrates causal statistical selection with neural forecasting, producing interpretable predictors and consistently achieving higher accuracy than models without feature selection or other standard baselines.

Methods:

The CCLR-DL framework integrates cross-correlation analysis, lagged multiple linear regression, and Granger causality testing with advanced deep learning architectures. This dual-phase approach first identifies causally significant predictors and then fits them into a deep learning model for multivariate time series forecasting. The framework was validated using a real-world dataset of clinical visits and diagnoses from 6.3 million individuals collected over 10 years.

Results:

In the evaluated setting, the CCLR-DL framework outperformed baseline models, achieving an average Root Mean Square Error (RMSE) improvement of 19.8% over univariate models, 60.1% over no feature selection, and 51.9% over random selection. The causality phase ensured that all selected predictors demonstrated a significant Granger-causal (GC) relationship. Simpler recurrent architectures, particularly bidirectional Long Short-Term Memory units (BiLSTM), yielded the most accurate forecasts by effectively capturing nonlinear temporal dependencies.

Conclusions:

By addressing the challenges of both prediction accuracy and model transparency, the CCLR-DL framework offers a new approach for high-dimensional, multivariate time series forecasting. In healthcare settings, it may enable decision-makers to anticipate demand shifts with greater reliability, allowing earlier staff scheduling, more efficient resource allocation, and reduced waiting times. In our evaluation, it consistently outperformed baseline strategies, delivering measurable improvements that translate into thousands of patient visits being forecasted more accurately across large populations.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computer methods and programs in biomedicine 工程技术-工程：生物医学

CiteScore

12.30

自引率

6.60%

发文量

601

审稿时长

135 days

期刊介绍： To encourage the development of formal computing methods, and their application in biomedical research and medical practice, by illustration of fundamental principles in biomedical informatics research; to stimulate basic research into application software design; to report the state of research of biomedical information processing projects; to report new computer methodologies applied in biomedical areas; the eventual distribution of demonstrable software to avoid duplication of effort; to provide a forum for discussion and improvement of existing software; to optimize contact between national organizations and regional user groups by promoting an international exchange of information on formal methods, standards and software in biomedicine. Computer Methods and Programs in Biomedicine covers computing methodology and software systems derived from computing science for implementation in all aspects of biomedical research and medical practice. It is designed to serve: biochemists; biologists; geneticists; immunologists; neuroscientists; pharmacologists; toxicologists; clinicians; epidemiologists; psychiatrists; psychologists; cardiologists; chemists; (radio)physicists; computer scientists; programmers and systems analysts; biomedical, clinical, electrical and other engineers; teachers of medical informatics and users of educational software.