Enrique Luna-Ramírez, Jorge Soria-Cruz, Apolinar Velarde-Martínez, E. Taya-Acosta
{"title":"Characterization of SARS-CoV-2 cases in Mexico using data mining","authors":"Enrique Luna-Ramírez, Jorge Soria-Cruz, Apolinar Velarde-Martínez, E. Taya-Acosta","doi":"10.35429/JCA.2020.15.4.19.25","DOIUrl":null,"url":null,"abstract":"In this paper, it is realized an analysis of the data published by the Federal Government of Mexico on the cases related to the test for detecting the presence of the SARS-CoV-2 virus, that originates the COVID-19 disease. More than a million cases were analyzed, most of which were positive to the test. For this study, twenty-one significant variables were considered, included the result of the test and the cases of death, going through the different factors that complicate a person’s health such as diabetes, chronic obstructive pulmonary disease (COPD), asthma, hypertension, obesity and smoking, among others. At the beginning of the study, the preparation of the data was carried out so that they could be treated using data mining techniques, based on the CRISP-DM methodology for extraction of knowledge. Thus, with the help of this type of techniques, data models were generated to characterize the development of the COVID-19 disease in the national and local (by States) panorama. As an important part of the models, various rules or correlations were observed among the different variables, which could be used to predict, in part, the future development of the COVID-19 disease in Mexico and, consequently, to establish best practices that target to reduce its social impact.","PeriodicalId":390253,"journal":{"name":"Revista de Computo Aplicado","volume":"39 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Revista de Computo Aplicado","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.35429/JCA.2020.15.4.19.25","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
In this paper, it is realized an analysis of the data published by the Federal Government of Mexico on the cases related to the test for detecting the presence of the SARS-CoV-2 virus, that originates the COVID-19 disease. More than a million cases were analyzed, most of which were positive to the test. For this study, twenty-one significant variables were considered, included the result of the test and the cases of death, going through the different factors that complicate a person’s health such as diabetes, chronic obstructive pulmonary disease (COPD), asthma, hypertension, obesity and smoking, among others. At the beginning of the study, the preparation of the data was carried out so that they could be treated using data mining techniques, based on the CRISP-DM methodology for extraction of knowledge. Thus, with the help of this type of techniques, data models were generated to characterize the development of the COVID-19 disease in the national and local (by States) panorama. As an important part of the models, various rules or correlations were observed among the different variables, which could be used to predict, in part, the future development of the COVID-19 disease in Mexico and, consequently, to establish best practices that target to reduce its social impact.