COVID-19 risk factors specification using Decision Tree based on the degree of redundancy between features

S. Mohammed, Mohammed Sami Mohammed
{"title":"COVID-19 risk factors specification using Decision Tree based on the degree of redundancy between features","authors":"S. Mohammed, Mohammed Sami Mohammed","doi":"10.1109/GCAT55367.2022.9971950","DOIUrl":null,"url":null,"abstract":"Based on the latest diseases which spread in the whole world and need to be predicted and classified. In addition, when testing and examining the samples will be safer with far data collecting such as COVID-19 cases. Therefore; this research provides a safe and accurate data mining prediction system to make a decision with high performance to prevent this spread. Such a study prevents or at least reduces the effect of contacting suspicious patients with others by providing a discovery system to detect this disease in these samples. Also, this study will reduce the effects of COVID-19 on marketing, teaching, and other different business, which lead to holding this disease separated at home with high knowledge of some symptoms that will be studied to specify the most affected features on this classification. However, this study could provide some information about viruses moving and keeping away at home with an early prediction. In this study, three techniques are applied for 1486 patients after data preprocessing and preparing for the performance evaluation. Risk factors are determined using a features selector and study of the effect of these features before and after minimization on the whole proposed model. Differences and reasons are shown in this paper due to different results which occurred while omitting unnecessary data. All the proposed models showed an enhancement in their performances after selecting the most affected features. But, DT showed the best prediction accuracy with about 96% compared to other models. On the other hand, other parameters are explained and showen some more advanced in the DT model than in other models.","PeriodicalId":133597,"journal":{"name":"2022 IEEE 3rd Global Conference for Advancement in Technology (GCAT)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 3rd Global Conference for Advancement in Technology (GCAT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/GCAT55367.2022.9971950","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Based on the latest diseases which spread in the whole world and need to be predicted and classified. In addition, when testing and examining the samples will be safer with far data collecting such as COVID-19 cases. Therefore; this research provides a safe and accurate data mining prediction system to make a decision with high performance to prevent this spread. Such a study prevents or at least reduces the effect of contacting suspicious patients with others by providing a discovery system to detect this disease in these samples. Also, this study will reduce the effects of COVID-19 on marketing, teaching, and other different business, which lead to holding this disease separated at home with high knowledge of some symptoms that will be studied to specify the most affected features on this classification. However, this study could provide some information about viruses moving and keeping away at home with an early prediction. In this study, three techniques are applied for 1486 patients after data preprocessing and preparing for the performance evaluation. Risk factors are determined using a features selector and study of the effect of these features before and after minimization on the whole proposed model. Differences and reasons are shown in this paper due to different results which occurred while omitting unnecessary data. All the proposed models showed an enhancement in their performances after selecting the most affected features. But, DT showed the best prediction accuracy with about 96% compared to other models. On the other hand, other parameters are explained and showen some more advanced in the DT model than in other models.
基于特征之间冗余度的决策树规范COVID-19风险因素
根据最新的疾病,在全球范围内传播,需要预测和分类。此外,当检测和检查样本时,收集更多的数据(如COVID-19病例)会更安全。因此;本研究提供了一个安全、准确的数据挖掘预测系统,以做出高性能的决策来防止这种传播。该研究通过提供在这些样本中检测该疾病的发现系统来防止或至少减少将可疑患者与其他人接触的影响。此外,本研究将减少COVID-19对市场营销,教学和其他不同业务的影响,这导致对将研究的一些症状的高度了解将这种疾病隔离在家中,以确定受影响最大的特征。然而,这项研究可以通过早期预测提供一些关于病毒在家中移动和远离的信息。本研究采用三种技术对1486例患者进行数据预处理和性能评价准备。使用特征选择器确定风险因素,并研究这些特征在最小化之前和之后对整个模型的影响。由于结果不同,省略了不必要的数据,本文给出了差异和原因。在选择受影响最大的特征后,所有模型的性能都有所提高。与其他模型相比,DT模型的预测准确率最高,约为96%。另一方面,DT模型比其他模型更高级地解释和显示了其他参数。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信