Hybrid ARIMA-LSTM for COVID-19 forecasting: a comparative AI modeling study.

IF 2.5 4区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
PeerJ Computer Science Pub Date : 2025-09-19 eCollection Date: 2025-01-01 DOI:10.7717/peerj-cs.3195
Al Mahmud, Syed Husni Noor Syed Hatim Noor, Kamarul Imran Musa, Firdaus Mohamad Hamzah, Zainab Mat Yudin, Noorshaida Kamaruddin, Ashwini M Madawana, Mohamad Arif Awang Nawi
{"title":"Hybrid ARIMA-LSTM for COVID-19 forecasting: a comparative AI modeling study.","authors":"Al Mahmud, Syed Husni Noor Syed Hatim Noor, Kamarul Imran Musa, Firdaus Mohamad Hamzah, Zainab Mat Yudin, Noorshaida Kamaruddin, Ashwini M Madawana, Mohamad Arif Awang Nawi","doi":"10.7717/peerj-cs.3195","DOIUrl":null,"url":null,"abstract":"<p><p>Pandemics present critical challenges to global health systems, economies, and societal structures, necessitating the development of accurate forecasting models for effective intervention and resource allocation. Classical statistical models such as the autoregressive integrated moving average (ARIMA) have been widely employed in epidemiological forecasting; however, they struggle to capture the nonlinear trends and dynamic fluctuations inherent in pandemic data. Conversely, deep learning models such as long short-term memory (LSTM) networks demonstrate strong capabilities in modeling complex dependencies but often require substantial data and computational resources. To boost forecasting precision, hybrid models such as ARIMA-LSTM integrate the advantages of traditional and deep learning methods. This study evaluates and compares the performance of ARIMA, LSTM, and hybrid ARIMA-LSTM models in predicting pandemic trends, using COVID-19 data from the Malaysian Ministry of Health as a case study. The dataset covers the period from 4 January 2021 to 18 September 2021, and model performance is evaluated using key metrics, including mean squared error (MSE), mean absolute error (MAE), mean absolute percentage error (MAPE), root mean squared error (RMSE), relative root mean squared error (RRMSE), normalized root mean squared error (NRMSE), and the coefficient of determination (R<sup>2</sup>). The results demonstrate that ARIMA performs poorly in capturing pandemic trends, while LSTM improves forecasting accuracy. However, the hybrid ARIMA-LSTM model consistently achieves the lowest error rates, confirming the advantage of integrating statistical and deep learning methodologies. All findings support the adoption of hybrid modeling approaches for pandemic forecasting, contributing to more accurate and reliable predictive analytics in epidemiology. Future research should investigate the generalizability of hybrid models across various infectious diseases and integrate additional real-time external variables to improve forecasting reliability.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"11 ","pages":"e3195"},"PeriodicalIF":2.5000,"publicationDate":"2025-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12453849/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PeerJ Computer Science","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.7717/peerj-cs.3195","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Pandemics present critical challenges to global health systems, economies, and societal structures, necessitating the development of accurate forecasting models for effective intervention and resource allocation. Classical statistical models such as the autoregressive integrated moving average (ARIMA) have been widely employed in epidemiological forecasting; however, they struggle to capture the nonlinear trends and dynamic fluctuations inherent in pandemic data. Conversely, deep learning models such as long short-term memory (LSTM) networks demonstrate strong capabilities in modeling complex dependencies but often require substantial data and computational resources. To boost forecasting precision, hybrid models such as ARIMA-LSTM integrate the advantages of traditional and deep learning methods. This study evaluates and compares the performance of ARIMA, LSTM, and hybrid ARIMA-LSTM models in predicting pandemic trends, using COVID-19 data from the Malaysian Ministry of Health as a case study. The dataset covers the period from 4 January 2021 to 18 September 2021, and model performance is evaluated using key metrics, including mean squared error (MSE), mean absolute error (MAE), mean absolute percentage error (MAPE), root mean squared error (RMSE), relative root mean squared error (RRMSE), normalized root mean squared error (NRMSE), and the coefficient of determination (R2). The results demonstrate that ARIMA performs poorly in capturing pandemic trends, while LSTM improves forecasting accuracy. However, the hybrid ARIMA-LSTM model consistently achieves the lowest error rates, confirming the advantage of integrating statistical and deep learning methodologies. All findings support the adoption of hybrid modeling approaches for pandemic forecasting, contributing to more accurate and reliable predictive analytics in epidemiology. Future research should investigate the generalizability of hybrid models across various infectious diseases and integrate additional real-time external variables to improve forecasting reliability.

混合ARIMA-LSTM预测新冠肺炎:人工智能模型的比较研究。
大流行对全球卫生系统、经济和社会结构构成重大挑战,需要开发准确的预测模型,以进行有效干预和资源分配。自回归综合移动平均(ARIMA)等经典统计模型已广泛应用于流行病学预测;然而,它们难以捕捉大流行数据中固有的非线性趋势和动态波动。相反,长短期记忆(LSTM)网络等深度学习模型在建模复杂依赖关系方面表现出强大的能力,但通常需要大量的数据和计算资源。为了提高预测精度,ARIMA-LSTM等混合模型整合了传统和深度学习方法的优点。本研究以马来西亚卫生部的COVID-19数据为例,评估和比较了ARIMA、LSTM和ARIMA-LSTM混合模型在预测大流行趋势方面的表现。该数据集涵盖了2021年1月4日至2021年9月18日期间,模型性能使用关键指标进行评估,包括均方误差(MSE)、平均绝对误差(MAE)、平均绝对百分比误差(MAPE)、均方根误差(RMSE)、相对均方根误差(RRMSE)、标准化均方根误差(NRMSE)和决定系数(R2)。结果表明,ARIMA在捕捉大流行趋势方面表现不佳,而LSTM提高了预测准确性。然而,混合ARIMA-LSTM模型始终实现最低的错误率,证实了将统计和深度学习方法相结合的优势。所有研究结果都支持采用混合建模方法进行大流行预测,有助于在流行病学中进行更准确和可靠的预测分析。未来的研究应探讨混合模型在各种传染病中的通用性,并整合额外的实时外部变量以提高预测的可靠性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
PeerJ Computer Science
PeerJ Computer Science Computer Science-General Computer Science
CiteScore
6.10
自引率
5.30%
发文量
332
审稿时长
10 weeks
期刊介绍: PeerJ Computer Science is the new open access journal covering all subject areas in computer science, with the backing of a prestigious advisory board and more than 300 academic editors.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信