Mgini - Improved Decision Tree using Minority Class Sensitive Splitting Criterion for Imbalanced Data of Covid-19

IF 0.5 4区 计算机科学 Q4 COMPUTER SCIENCE, INFORMATION SYSTEMS
Pratik A. Barot, H. Jethva
{"title":"Mgini - Improved Decision Tree using Minority Class Sensitive Splitting Criterion for Imbalanced Data of Covid-19","authors":"Pratik A. Barot, H. Jethva","doi":"10.6688/jise.202109_37(5).0008","DOIUrl":null,"url":null,"abstract":"In the time of COVID-19, medical facilities struggling to fight against the pandemic. Most of the countries face a tough time fighting against this virus outbreak. Even developed countries are struggling to deal with this virus outbreak. Common problem countries face is a lack of medical staff and medical equipment. Machine learning has the potential to play an important role in a different area of medical facilities. With the help of the machine learning model, an effective diagnostic tool can be built which helps in the time of scarcity of medical staff. However medical data is imbalanced and this skew nature of data prevent machine learning algorithm from achieving high accuracy. To deal with this problem of imbalanced data, we proposed a modified decision tree algorithm that uses a minority sensitive Gini index called Mgini. In an imbalanced dataset of COVID-19, it is important to focus on the reduction of overall misclassification cost instead of trying improvement in accuracy value. Mgini is useful splitting criteria when the misclassification cost of the minority sample is huge as compared to the majority class. The use of this proposed new Gini index as a splitting criterion in the decision tree reduces the misclassification cost. Mgini based decision tree has higher accuracy and low misclassification cost as compare to the traditional Gini index based CART algorithm. Our proposed cost-sensitive approach improves imbalanced data classification without the use of data level sampling techniques.","PeriodicalId":50177,"journal":{"name":"Journal of Information Science and Engineering","volume":"22 1","pages":"1097-1108"},"PeriodicalIF":0.5000,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Information Science and Engineering","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.6688/jise.202109_37(5).0008","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 1

Abstract

In the time of COVID-19, medical facilities struggling to fight against the pandemic. Most of the countries face a tough time fighting against this virus outbreak. Even developed countries are struggling to deal with this virus outbreak. Common problem countries face is a lack of medical staff and medical equipment. Machine learning has the potential to play an important role in a different area of medical facilities. With the help of the machine learning model, an effective diagnostic tool can be built which helps in the time of scarcity of medical staff. However medical data is imbalanced and this skew nature of data prevent machine learning algorithm from achieving high accuracy. To deal with this problem of imbalanced data, we proposed a modified decision tree algorithm that uses a minority sensitive Gini index called Mgini. In an imbalanced dataset of COVID-19, it is important to focus on the reduction of overall misclassification cost instead of trying improvement in accuracy value. Mgini is useful splitting criteria when the misclassification cost of the minority sample is huge as compared to the majority class. The use of this proposed new Gini index as a splitting criterion in the decision tree reduces the misclassification cost. Mgini based decision tree has higher accuracy and low misclassification cost as compare to the traditional Gini index based CART algorithm. Our proposed cost-sensitive approach improves imbalanced data classification without the use of data level sampling techniques.
基于少数派类敏感分割准则的新冠肺炎不平衡数据改进Mgini决策树
在2019冠状病毒病期间,医疗机构正在努力抗击疫情。大多数国家都面临着抗击这一病毒爆发的艰难时期。即使是发达国家也在努力应对这次病毒爆发。各国面临的共同问题是缺乏医务人员和医疗设备。机器学习有可能在医疗设施的不同领域发挥重要作用。在机器学习模型的帮助下,可以建立一个有效的诊断工具,在医护人员短缺的情况下提供帮助。然而,医疗数据是不平衡的,这种数据的偏倚性阻碍了机器学习算法实现高精度。为了解决这一数据不平衡问题,我们提出了一种改进的决策树算法,该算法使用少数群体敏感的基尼指数Mgini。在不平衡的COVID-19数据集中,更重要的是关注总体误分类成本的降低,而不是试图提高准确率值。当少数样本的错误分类成本比多数样本大时,Mgini是有用的分割标准。将提出的新基尼系数作为决策树的分割标准,降低了错误分类的代价。与传统的基于Gini指数的CART算法相比,基于Mgini的决策树具有更高的准确率和更低的误分类代价。我们提出的成本敏感方法在不使用数据级抽样技术的情况下改进了不平衡数据分类。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Information Science and Engineering
Journal of Information Science and Engineering 工程技术-计算机:信息系统
CiteScore
2.00
自引率
0.00%
发文量
4
审稿时长
8 months
期刊介绍: The Journal of Information Science and Engineering is dedicated to the dissemination of information on computer science, computer engineering, and computer systems. This journal encourages articles on original research in the areas of computer hardware, software, man-machine interface, theory and applications. tutorial papers in the above-mentioned areas, and state-of-the-art papers on various aspects of computer systems and applications.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信