An ensemble approach improves the prediction of the COVID-19 pandemic in South Korea.

IF 4.5 3区 医学 Q1 PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH
Kyulhee Han, Catherine Apio, Hanbyul Song, Bogyeom Lee, Xuwen Hu, Jiwon Park, Liu Zhe, Taewan Goo, Taesung Park
{"title":"An ensemble approach improves the prediction of the COVID-19 pandemic in South Korea.","authors":"Kyulhee Han, Catherine Apio, Hanbyul Song, Bogyeom Lee, Xuwen Hu, Jiwon Park, Liu Zhe, Taewan Goo, Taesung Park","doi":"10.7189/jogh.15.04079","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Modelling can contribute to disease prevention and control strategies. Accurate predictions of future cases and mortality rates were essential for establishing appropriate policies during the COVID-19 pandemic. However, no single model yielded definite conclusions, with each having specific strengths and weaknesses. Here we propose an ensemble learning approach which can offset the limitations of each model and improve prediction performances.</p><p><strong>Methods: </strong>We generated predictions for the transmission and impact of COVID-19 in South Korea using seven individual models, including mathematical, statistical, and machine learning approaches. We integrated these predictions using three ensemble methods: stacking, average, and weighted average ensemble (WAE). We used train and test errors to measure a model's performance and selected the best covariate combinations based on the lowest train error. We then evaluated model performance using five error measures (r<sup>2</sup>, weighted mean absolute percentage error (WMAPE), autoregressive integrated moving average (ARIMA), mean squared error (MSE), root mean squared error (RMSE), and mean absolute percentage error (MAPE)) and selected the optimal covariate combination accordingly. To validate the generalisability of our approach, we applied the same modelling framework to USA data.</p><p><strong>Results: </strong>Booster shot rate + Omicron variant BA.5 rate was the most commonly selected combination of covariates. For raw data evaluated using the WMAPE, individual models achieved the following: Generalised additive modelling (GAM) reached a value of 0.244 for the daily number of confirmed cases, a value of 0.172 for the time series Poisson for the daily number of confirmed deaths, and a value of 0.022 for both ARIMA and time series Poisson for the daily number of ICU patients. For smoothed data, the Holt-Winters model achieved a value of 0.058 for daily confirmed cases, while ARIMA attained a value of 0.058 for the daily number of confirmed deaths and 0.013 for the daily number of ICU patients. Among ensemble models, the SVM-based stacking ensemble achieved error values of 0.235 for the daily number of confirmed cases, 0.118 for the daily number of deaths, and 0.019 for the daily number of ICU patients on raw data. For smoothed data, the average ensemble and weighted average ensemble achieved 0.060 for the daily number of confirmed cases and 0.013 for daily ICU patients. The ensemble models also generalised well when applied to data from the USA.Booster shot rate + Omicron variant BA.5 rate was the most commonly selected combination of covariates. For raw data, GAM (0.244) predicted daily confirmed cases best, time series Poisson (0.172) predicted daily confirmed deaths, and both ARIMA and time series Poisson (0.022) predicted daily ICU patients, based on WMAPE. For smoothed data, time series Poisson predicted daily confirmed cases (0.065) best, while ARIMA best predicted daily confirmed deaths (0.058) and ICU patients (0.013). For ensemble models, stacking ensemble using SVM was the best model for predicting daily confirmed cases (0.228), deaths (0.11), and ICU patients (0.02). With smoothed data, average ensemble and WAE were the best models for predicting daily confirmed cases (0.058) and ICU patients (0.011). The performance of ensemble models was generalised to other countries using the USA data for predictive performance.</p><p><strong>Conclusions: </strong>No single model performed consistently. While the ensemble models did not always provide the best predictions, a comparison of first-best and second-best models showed that they performed considerably better than the single models. If an ensemble model was not the best performing model, its performance was always not far from the best single model: a look at the mean and variance of the error measures shows that ensemble models provided stable predictions without much variation in their performances compared to single models. These results can be used to inform policymaking during future pandemics.</p>","PeriodicalId":48734,"journal":{"name":"Journal of Global Health","volume":"15 ","pages":"04079"},"PeriodicalIF":4.5000,"publicationDate":"2025-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11949510/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Global Health","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.7189/jogh.15.04079","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Modelling can contribute to disease prevention and control strategies. Accurate predictions of future cases and mortality rates were essential for establishing appropriate policies during the COVID-19 pandemic. However, no single model yielded definite conclusions, with each having specific strengths and weaknesses. Here we propose an ensemble learning approach which can offset the limitations of each model and improve prediction performances.

Methods: We generated predictions for the transmission and impact of COVID-19 in South Korea using seven individual models, including mathematical, statistical, and machine learning approaches. We integrated these predictions using three ensemble methods: stacking, average, and weighted average ensemble (WAE). We used train and test errors to measure a model's performance and selected the best covariate combinations based on the lowest train error. We then evaluated model performance using five error measures (r2, weighted mean absolute percentage error (WMAPE), autoregressive integrated moving average (ARIMA), mean squared error (MSE), root mean squared error (RMSE), and mean absolute percentage error (MAPE)) and selected the optimal covariate combination accordingly. To validate the generalisability of our approach, we applied the same modelling framework to USA data.

Results: Booster shot rate + Omicron variant BA.5 rate was the most commonly selected combination of covariates. For raw data evaluated using the WMAPE, individual models achieved the following: Generalised additive modelling (GAM) reached a value of 0.244 for the daily number of confirmed cases, a value of 0.172 for the time series Poisson for the daily number of confirmed deaths, and a value of 0.022 for both ARIMA and time series Poisson for the daily number of ICU patients. For smoothed data, the Holt-Winters model achieved a value of 0.058 for daily confirmed cases, while ARIMA attained a value of 0.058 for the daily number of confirmed deaths and 0.013 for the daily number of ICU patients. Among ensemble models, the SVM-based stacking ensemble achieved error values of 0.235 for the daily number of confirmed cases, 0.118 for the daily number of deaths, and 0.019 for the daily number of ICU patients on raw data. For smoothed data, the average ensemble and weighted average ensemble achieved 0.060 for the daily number of confirmed cases and 0.013 for daily ICU patients. The ensemble models also generalised well when applied to data from the USA.Booster shot rate + Omicron variant BA.5 rate was the most commonly selected combination of covariates. For raw data, GAM (0.244) predicted daily confirmed cases best, time series Poisson (0.172) predicted daily confirmed deaths, and both ARIMA and time series Poisson (0.022) predicted daily ICU patients, based on WMAPE. For smoothed data, time series Poisson predicted daily confirmed cases (0.065) best, while ARIMA best predicted daily confirmed deaths (0.058) and ICU patients (0.013). For ensemble models, stacking ensemble using SVM was the best model for predicting daily confirmed cases (0.228), deaths (0.11), and ICU patients (0.02). With smoothed data, average ensemble and WAE were the best models for predicting daily confirmed cases (0.058) and ICU patients (0.011). The performance of ensemble models was generalised to other countries using the USA data for predictive performance.

Conclusions: No single model performed consistently. While the ensemble models did not always provide the best predictions, a comparison of first-best and second-best models showed that they performed considerably better than the single models. If an ensemble model was not the best performing model, its performance was always not far from the best single model: a look at the mean and variance of the error measures shows that ensemble models provided stable predictions without much variation in their performances compared to single models. These results can be used to inform policymaking during future pandemics.

求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Global Health
Journal of Global Health PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH -
CiteScore
6.10
自引率
2.80%
发文量
240
审稿时长
6 weeks
期刊介绍: Journal of Global Health is a peer-reviewed journal published by the Edinburgh University Global Health Society, a not-for-profit organization registered in the UK. We publish editorials, news, viewpoints, original research and review articles in two issues per year.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信