Influencing Factors and Clustering Characteristics of COVID-19: A Global Analysis

IF 6.2 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Big Data Mining and Analytics Pub Date : 2022-07-18 DOI:10.26599/BDMA.2022.9020010

Tianlong Zheng;Chunli Zhang;Yueting Shi;Debao Chen;Sheng Liu

{"title":"Influencing Factors and Clustering Characteristics of COVID-19: A Global Analysis","authors":"Tianlong Zheng;Chunli Zhang;Yueting Shi;Debao Chen;Sheng Liu","doi":"10.26599/BDMA.2022.9020010","DOIUrl":null,"url":null,"abstract":"The unprecedented coronavirus disease 2019 (COVID-19) pandemic is still raging (in year 2021) in many countries worldwide. Various response strategies to study the characteristics and distributions of the virus in various regions of the world have been developed to assist in the prevention and control of this epidemic. Descriptive statistics and regression analysis on COVID-19 data from different countries were conducted in this study to compare and evaluate various regression models. Results showed that the extreme random forest regression (ERFR) model had the best performance, and factors such as population density, ozone, median age, life expectancy, and Human Development Index (HDI) were relatively influential on the spread and diffusion of COVID-19 in the ERFR model. In addition, the epidemic clustering characteristics were analyzed through the spectral clustering algorithm. The visualization results of spectral clustering showed that the geographical distribution of global COVID-19 pandemic spread formation was highly clustered, and its clustering characteristics and influencing factors also exhibited some consistency in distribution. This study aims to deepen the understanding of the international community regarding the global COVID-19 pandemic to develop measures for countries worldwide to mitigate potential large-scale outbreaks and improve the ability to respond to such public health emergencies.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"5 4","pages":"318-338"},"PeriodicalIF":6.2000,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/8254253/9832761/09832767.pdf","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Big Data Mining and Analytics","FirstCategoryId":"1093","ListUrlMain":"https://ieeexplore.ieee.org/document/9832767/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 2

Abstract

The unprecedented coronavirus disease 2019 (COVID-19) pandemic is still raging (in year 2021) in many countries worldwide. Various response strategies to study the characteristics and distributions of the virus in various regions of the world have been developed to assist in the prevention and control of this epidemic. Descriptive statistics and regression analysis on COVID-19 data from different countries were conducted in this study to compare and evaluate various regression models. Results showed that the extreme random forest regression (ERFR) model had the best performance, and factors such as population density, ozone, median age, life expectancy, and Human Development Index (HDI) were relatively influential on the spread and diffusion of COVID-19 in the ERFR model. In addition, the epidemic clustering characteristics were analyzed through the spectral clustering algorithm. The visualization results of spectral clustering showed that the geographical distribution of global COVID-19 pandemic spread formation was highly clustered, and its clustering characteristics and influencing factors also exhibited some consistency in distribution. This study aims to deepen the understanding of the international community regarding the global COVID-19 pandemic to develop measures for countries worldwide to mitigate potential large-scale outbreaks and improve the ability to respond to such public health emergencies.

查看原文本刊更多论文

新冠肺炎疫情影响因素及聚集特征的全球分析

前所未有的2019冠状病毒病（新冠肺炎）大流行仍在全球许多国家肆虐（2021年）。已经制定了各种应对策略来研究病毒在世界各个地区的特征和分布，以协助预防和控制这一流行病。本研究对不同国家新冠肺炎数据进行描述性统计和回归分析，对各种回归模型进行比较和评价。结果表明，极端随机森林回归（ERFR）模型的性能最好，人口密度、臭氧、中位年龄、预期寿命和人类发展指数（HDI）等因素对ERFR模型中新冠肺炎的传播和扩散影响相对较大。此外，通过谱聚类算法分析了疫情的聚类特征。光谱聚类的可视化结果表明，全球新冠肺炎疫情扩散形成的地理分布具有高度的聚类性，其聚类特征和影响因素在分布上也表现出一定的一致性。本研究旨在加深国际社会对全球新冠肺炎大流行的理解，为世界各国制定措施，缓解潜在的大规模疫情，提高应对此类突发公共卫生事件的能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Big Data Mining and Analytics Computer Science-Computer Science Applications

CiteScore

20.90

自引率

2.20%

发文量

期刊介绍： Big Data Mining and Analytics, a publication by Tsinghua University Press, presents groundbreaking research in the field of big data research and its applications. This comprehensive book delves into the exploration and analysis of vast amounts of data from diverse sources to uncover hidden patterns, correlations, insights, and knowledge. Featuring the latest developments, research issues, and solutions, this book offers valuable insights into the world of big data. It provides a deep understanding of data mining techniques, data analytics, and their practical applications. Big Data Mining and Analytics has gained significant recognition and is indexed and abstracted in esteemed platforms such as ESCI, EI, Scopus, DBLP Computer Science, Google Scholar, INSPEC, CSCD, DOAJ, CNKI, and more. With its wealth of information and its ability to transform the way we perceive and utilize data, this book is a must-read for researchers, professionals, and anyone interested in the field of big data analytics.