Hybrid ARIMA-LSTM for COVID-19 forecasting: a comparative AI modeling study.

IF 2.5 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

PeerJ Computer Science Pub Date : 2025-09-19 eCollection Date: 2025-01-01 DOI:10.7717/peerj-cs.3195

Al Mahmud, Syed Husni Noor Syed Hatim Noor, Kamarul Imran Musa, Firdaus Mohamad Hamzah, Zainab Mat Yudin, Noorshaida Kamaruddin, Ashwini M Madawana, Mohamad Arif Awang Nawi

{"title":"Hybrid ARIMA-LSTM for COVID-19 forecasting: a comparative AI modeling study.","authors":"Al Mahmud, Syed Husni Noor Syed Hatim Noor, Kamarul Imran Musa, Firdaus Mohamad Hamzah, Zainab Mat Yudin, Noorshaida Kamaruddin, Ashwini M Madawana, Mohamad Arif Awang Nawi","doi":"10.7717/peerj-cs.3195","DOIUrl":null,"url":null,"abstract":"Pandemics present critical challenges to global health systems, economies, and societal structures, necessitating the development of accurate forecasting models for effective intervention and resource allocation. Classical statistical models such as the autoregressive integrated moving average (ARIMA) have been widely employed in epidemiological forecasting; however, they struggle to capture the nonlinear trends and dynamic fluctuations inherent in pandemic data. Conversely, deep learning models such as long short-term memory (LSTM) networks demonstrate strong capabilities in modeling complex dependencies but often require substantial data and computational resources. To boost forecasting precision, hybrid models such as ARIMA-LSTM integrate the advantages of traditional and deep learning methods. This study evaluates and compares the performance of ARIMA, LSTM, and hybrid ARIMA-LSTM models in predicting pandemic trends, using COVID-19 data from the Malaysian Ministry of Health as a case study. The dataset covers the period from 4 January 2021 to 18 September 2021, and model performance is evaluated using key metrics, including mean squared error (MSE), mean absolute error (MAE), mean absolute percentage error (MAPE), root mean squared error (RMSE), relative root mean squared error (RRMSE), normalized root mean squared error (NRMSE), and the coefficient of determination (R2). The results demonstrate that ARIMA performs poorly in capturing pandemic trends, while LSTM improves forecasting accuracy. However, the hybrid ARIMA-LSTM model consistently achieves the lowest error rates, confirming the advantage of integrating statistical and deep learning methodologies. All findings support the adoption of hybrid modeling approaches for pandemic forecasting, contributing to more accurate and reliable predictive analytics in epidemiology. Future research should investigate the generalizability of hybrid models across various infectious diseases and integrate additional real-time external variables to improve forecasting reliability.","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"11 ","pages":"e3195"},"PeriodicalIF":2.5000,"publicationDate":"2025-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12453849/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PeerJ Computer Science","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.7717/peerj-cs.3195","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Pandemics present critical challenges to global health systems, economies, and societal structures, necessitating the development of accurate forecasting models for effective intervention and resource allocation. Classical statistical models such as the autoregressive integrated moving average (ARIMA) have been widely employed in epidemiological forecasting; however, they struggle to capture the nonlinear trends and dynamic fluctuations inherent in pandemic data. Conversely, deep learning models such as long short-term memory (LSTM) networks demonstrate strong capabilities in modeling complex dependencies but often require substantial data and computational resources. To boost forecasting precision, hybrid models such as ARIMA-LSTM integrate the advantages of traditional and deep learning methods. This study evaluates and compares the performance of ARIMA, LSTM, and hybrid ARIMA-LSTM models in predicting pandemic trends, using COVID-19 data from the Malaysian Ministry of Health as a case study. The dataset covers the period from 4 January 2021 to 18 September 2021, and model performance is evaluated using key metrics, including mean squared error (MSE), mean absolute error (MAE), mean absolute percentage error (MAPE), root mean squared error (RMSE), relative root mean squared error (RRMSE), normalized root mean squared error (NRMSE), and the coefficient of determination (R²). The results demonstrate that ARIMA performs poorly in capturing pandemic trends, while LSTM improves forecasting accuracy. However, the hybrid ARIMA-LSTM model consistently achieves the lowest error rates, confirming the advantage of integrating statistical and deep learning methodologies. All findings support the adoption of hybrid modeling approaches for pandemic forecasting, contributing to more accurate and reliable predictive analytics in epidemiology. Future research should investigate the generalizability of hybrid models across various infectious diseases and integrate additional real-time external variables to improve forecasting reliability.

查看原文本刊更多论文

混合ARIMA-LSTM预测新冠肺炎：人工智能模型的比较研究。

大流行对全球卫生系统、经济和社会结构构成重大挑战，需要开发准确的预测模型，以进行有效干预和资源分配。自回归综合移动平均（ARIMA）等经典统计模型已广泛应用于流行病学预测；然而，它们难以捕捉大流行数据中固有的非线性趋势和动态波动。相反，长短期记忆（LSTM）网络等深度学习模型在建模复杂依赖关系方面表现出强大的能力，但通常需要大量的数据和计算资源。为了提高预测精度，ARIMA-LSTM等混合模型整合了传统和深度学习方法的优点。本研究以马来西亚卫生部的COVID-19数据为例，评估和比较了ARIMA、LSTM和ARIMA-LSTM混合模型在预测大流行趋势方面的表现。该数据集涵盖了2021年1月4日至2021年9月18日期间，模型性能使用关键指标进行评估，包括均方误差（MSE）、平均绝对误差（MAE）、平均绝对百分比误差（MAPE）、均方根误差（RMSE）、相对均方根误差（RRMSE）、标准化均方根误差（NRMSE）和决定系数（R2）。结果表明，ARIMA在捕捉大流行趋势方面表现不佳，而LSTM提高了预测准确性。然而，混合ARIMA-LSTM模型始终实现最低的错误率，证实了将统计和深度学习方法相结合的优势。所有研究结果都支持采用混合建模方法进行大流行预测，有助于在流行病学中进行更准确和可靠的预测分析。未来的研究应探讨混合模型在各种传染病中的通用性，并整合额外的实时外部变量以提高预测的可靠性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

PeerJ Computer Science Computer Science-General Computer Science

CiteScore

6.10

自引率

5.30%

发文量

332

审稿时长

10 weeks

期刊介绍： PeerJ Computer Science is the new open access journal covering all subject areas in computer science, with the backing of a prestigious advisory board and more than 300 academic editors.