Al Mahmud, Syed Husni Noor Syed Hatim Noor, Kamarul Imran Musa, Firdaus Mohamad Hamzah, Zainab Mat Yudin, Noorshaida Kamaruddin, Ashwini M Madawana, Mohamad Arif Awang Nawi
{"title":"Hybrid ARIMA-LSTM for COVID-19 forecasting: a comparative AI modeling study.","authors":"Al Mahmud, Syed Husni Noor Syed Hatim Noor, Kamarul Imran Musa, Firdaus Mohamad Hamzah, Zainab Mat Yudin, Noorshaida Kamaruddin, Ashwini M Madawana, Mohamad Arif Awang Nawi","doi":"10.7717/peerj-cs.3195","DOIUrl":null,"url":null,"abstract":"<p><p>Pandemics present critical challenges to global health systems, economies, and societal structures, necessitating the development of accurate forecasting models for effective intervention and resource allocation. Classical statistical models such as the autoregressive integrated moving average (ARIMA) have been widely employed in epidemiological forecasting; however, they struggle to capture the nonlinear trends and dynamic fluctuations inherent in pandemic data. Conversely, deep learning models such as long short-term memory (LSTM) networks demonstrate strong capabilities in modeling complex dependencies but often require substantial data and computational resources. To boost forecasting precision, hybrid models such as ARIMA-LSTM integrate the advantages of traditional and deep learning methods. This study evaluates and compares the performance of ARIMA, LSTM, and hybrid ARIMA-LSTM models in predicting pandemic trends, using COVID-19 data from the Malaysian Ministry of Health as a case study. The dataset covers the period from 4 January 2021 to 18 September 2021, and model performance is evaluated using key metrics, including mean squared error (MSE), mean absolute error (MAE), mean absolute percentage error (MAPE), root mean squared error (RMSE), relative root mean squared error (RRMSE), normalized root mean squared error (NRMSE), and the coefficient of determination (R<sup>2</sup>). The results demonstrate that ARIMA performs poorly in capturing pandemic trends, while LSTM improves forecasting accuracy. However, the hybrid ARIMA-LSTM model consistently achieves the lowest error rates, confirming the advantage of integrating statistical and deep learning methodologies. All findings support the adoption of hybrid modeling approaches for pandemic forecasting, contributing to more accurate and reliable predictive analytics in epidemiology. Future research should investigate the generalizability of hybrid models across various infectious diseases and integrate additional real-time external variables to improve forecasting reliability.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"11 ","pages":"e3195"},"PeriodicalIF":2.5000,"publicationDate":"2025-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12453849/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PeerJ Computer Science","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.7717/peerj-cs.3195","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Pandemics present critical challenges to global health systems, economies, and societal structures, necessitating the development of accurate forecasting models for effective intervention and resource allocation. Classical statistical models such as the autoregressive integrated moving average (ARIMA) have been widely employed in epidemiological forecasting; however, they struggle to capture the nonlinear trends and dynamic fluctuations inherent in pandemic data. Conversely, deep learning models such as long short-term memory (LSTM) networks demonstrate strong capabilities in modeling complex dependencies but often require substantial data and computational resources. To boost forecasting precision, hybrid models such as ARIMA-LSTM integrate the advantages of traditional and deep learning methods. This study evaluates and compares the performance of ARIMA, LSTM, and hybrid ARIMA-LSTM models in predicting pandemic trends, using COVID-19 data from the Malaysian Ministry of Health as a case study. The dataset covers the period from 4 January 2021 to 18 September 2021, and model performance is evaluated using key metrics, including mean squared error (MSE), mean absolute error (MAE), mean absolute percentage error (MAPE), root mean squared error (RMSE), relative root mean squared error (RRMSE), normalized root mean squared error (NRMSE), and the coefficient of determination (R2). The results demonstrate that ARIMA performs poorly in capturing pandemic trends, while LSTM improves forecasting accuracy. However, the hybrid ARIMA-LSTM model consistently achieves the lowest error rates, confirming the advantage of integrating statistical and deep learning methodologies. All findings support the adoption of hybrid modeling approaches for pandemic forecasting, contributing to more accurate and reliable predictive analytics in epidemiology. Future research should investigate the generalizability of hybrid models across various infectious diseases and integrate additional real-time external variables to improve forecasting reliability.
期刊介绍:
PeerJ Computer Science is the new open access journal covering all subject areas in computer science, with the backing of a prestigious advisory board and more than 300 academic editors.