Cross-sectional analysis and data-driven forecasting of confirmed COVID-19 cases.

Nan Jing, Zijing Shi, Yi Hu, Ji Yuan
{"title":"Cross-sectional analysis and data-driven forecasting of confirmed COVID-19 cases.","authors":"Nan Jing,&nbsp;Zijing Shi,&nbsp;Yi Hu,&nbsp;Ji Yuan","doi":"10.1007/s10489-021-02616-8","DOIUrl":null,"url":null,"abstract":"<p><p>The coronavirus disease 2019 (COVID-19) is rapidly becoming one of the leading causes for mortality worldwide. Various models have been built in previous works to study the spread characteristics and trends of the COVID-19 pandemic. Nevertheless, due to the limited information and data source, the understanding of the spread and impact of the COVID-19 pandemic is still restricted. Therefore, within this paper not only daily historical time-series data of COVID-19 have been taken into account during the modeling, but also regional attributes, e.g., geographic and local factors, which may have played an important role on the confirmed COVID-19 cases in certain regions. In this regard, this study then conducts a comprehensive cross-sectional analysis and data-driven forecasting on this pandemic. The critical features, which has the significant influence on the infection rate of COVID-19, is determined by employing XGB (eXtreme Gradient Boosting) algorithm and SHAP (SHapley Additive exPlanation) and the comparison is carried out by utilizing the RF (Random Forest) and LGB (Light Gradient Boosting) models. To forecast the number of confirmed COVID-19 cases more accurately, a Dual-Stage Attention-Based Recurrent Neural Network (DA-RNN) is applied in this paper. This model has better performance than SVR (Support Vector Regression) and the encoder-decoder network on the experimental dataset. And the model performance is evaluated in the light of three statistic metrics, i.e. MAE, RMSE and <i>R</i> <sup>2</sup>. Furthermore, this study is expected to serve as meaningful references for the control and prevention of the COVID-19 pandemic.</p>","PeriodicalId":72260,"journal":{"name":"Applied intelligence (Dordrecht, Netherlands)","volume":"52 3","pages":"3303-3318"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1007/s10489-021-02616-8","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied intelligence (Dordrecht, Netherlands)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s10489-021-02616-8","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2021/7/5 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

The coronavirus disease 2019 (COVID-19) is rapidly becoming one of the leading causes for mortality worldwide. Various models have been built in previous works to study the spread characteristics and trends of the COVID-19 pandemic. Nevertheless, due to the limited information and data source, the understanding of the spread and impact of the COVID-19 pandemic is still restricted. Therefore, within this paper not only daily historical time-series data of COVID-19 have been taken into account during the modeling, but also regional attributes, e.g., geographic and local factors, which may have played an important role on the confirmed COVID-19 cases in certain regions. In this regard, this study then conducts a comprehensive cross-sectional analysis and data-driven forecasting on this pandemic. The critical features, which has the significant influence on the infection rate of COVID-19, is determined by employing XGB (eXtreme Gradient Boosting) algorithm and SHAP (SHapley Additive exPlanation) and the comparison is carried out by utilizing the RF (Random Forest) and LGB (Light Gradient Boosting) models. To forecast the number of confirmed COVID-19 cases more accurately, a Dual-Stage Attention-Based Recurrent Neural Network (DA-RNN) is applied in this paper. This model has better performance than SVR (Support Vector Regression) and the encoder-decoder network on the experimental dataset. And the model performance is evaluated in the light of three statistic metrics, i.e. MAE, RMSE and R 2. Furthermore, this study is expected to serve as meaningful references for the control and prevention of the COVID-19 pandemic.

Abstract Image

Abstract Image

Abstract Image

COVID-19确诊病例的横断面分析与数据驱动预测。
2019年冠状病毒病(COVID-19)正迅速成为全球死亡的主要原因之一。在以往的工作中,已经建立了各种模型来研究COVID-19大流行的传播特征和趋势。然而,由于信息和数据来源有限,对COVID-19大流行的传播和影响的了解仍然有限。因此,本文在建模时不仅考虑了COVID-19的日常历史时间序列数据,还考虑了区域属性,如地理和当地因素,这些因素可能对某些地区的COVID-19确诊病例起重要作用。在这方面,本研究随后对这次大流行进行了全面的横断面分析和数据驱动的预测。采用XGB (eXtreme Gradient Boosting)算法和SHapley Additive exPlanation (SHapley Additive exPlanation)算法确定对COVID-19感染率有显著影响的关键特征,并采用RF (Random Forest)和LGB (Light Gradient Boosting)模型进行比较。为了更准确地预测新冠肺炎确诊病例数,本文采用了基于双阶段注意力的递归神经网络(DA-RNN)。该模型在实验数据集上的性能优于支持向量回归(SVR)和编解码器网络。并根据MAE、RMSE和r2三个统计指标对模型的性能进行评价。同时,本研究也有望为新冠肺炎疫情防控提供有意义的参考。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信