{"title":"基于少数派类敏感分割准则的新冠肺炎不平衡数据改进Mgini决策树","authors":"Pratik A. Barot, H. Jethva","doi":"10.6688/jise.202109_37(5).0008","DOIUrl":null,"url":null,"abstract":"In the time of COVID-19, medical facilities struggling to fight against the pandemic. Most of the countries face a tough time fighting against this virus outbreak. Even developed countries are struggling to deal with this virus outbreak. Common problem countries face is a lack of medical staff and medical equipment. Machine learning has the potential to play an important role in a different area of medical facilities. With the help of the machine learning model, an effective diagnostic tool can be built which helps in the time of scarcity of medical staff. However medical data is imbalanced and this skew nature of data prevent machine learning algorithm from achieving high accuracy. To deal with this problem of imbalanced data, we proposed a modified decision tree algorithm that uses a minority sensitive Gini index called Mgini. In an imbalanced dataset of COVID-19, it is important to focus on the reduction of overall misclassification cost instead of trying improvement in accuracy value. Mgini is useful splitting criteria when the misclassification cost of the minority sample is huge as compared to the majority class. The use of this proposed new Gini index as a splitting criterion in the decision tree reduces the misclassification cost. Mgini based decision tree has higher accuracy and low misclassification cost as compare to the traditional Gini index based CART algorithm. Our proposed cost-sensitive approach improves imbalanced data classification without the use of data level sampling techniques.","PeriodicalId":50177,"journal":{"name":"Journal of Information Science and Engineering","volume":null,"pages":null},"PeriodicalIF":0.5000,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Mgini - Improved Decision Tree using Minority Class Sensitive Splitting Criterion for Imbalanced Data of Covid-19\",\"authors\":\"Pratik A. Barot, H. Jethva\",\"doi\":\"10.6688/jise.202109_37(5).0008\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the time of COVID-19, medical facilities struggling to fight against the pandemic. Most of the countries face a tough time fighting against this virus outbreak. Even developed countries are struggling to deal with this virus outbreak. Common problem countries face is a lack of medical staff and medical equipment. Machine learning has the potential to play an important role in a different area of medical facilities. With the help of the machine learning model, an effective diagnostic tool can be built which helps in the time of scarcity of medical staff. However medical data is imbalanced and this skew nature of data prevent machine learning algorithm from achieving high accuracy. To deal with this problem of imbalanced data, we proposed a modified decision tree algorithm that uses a minority sensitive Gini index called Mgini. In an imbalanced dataset of COVID-19, it is important to focus on the reduction of overall misclassification cost instead of trying improvement in accuracy value. Mgini is useful splitting criteria when the misclassification cost of the minority sample is huge as compared to the majority class. The use of this proposed new Gini index as a splitting criterion in the decision tree reduces the misclassification cost. Mgini based decision tree has higher accuracy and low misclassification cost as compare to the traditional Gini index based CART algorithm. Our proposed cost-sensitive approach improves imbalanced data classification without the use of data level sampling techniques.\",\"PeriodicalId\":50177,\"journal\":{\"name\":\"Journal of Information Science and Engineering\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.5000,\"publicationDate\":\"2021-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Information Science and Engineering\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.6688/jise.202109_37(5).0008\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Information Science and Engineering","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.6688/jise.202109_37(5).0008","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Mgini - Improved Decision Tree using Minority Class Sensitive Splitting Criterion for Imbalanced Data of Covid-19
In the time of COVID-19, medical facilities struggling to fight against the pandemic. Most of the countries face a tough time fighting against this virus outbreak. Even developed countries are struggling to deal with this virus outbreak. Common problem countries face is a lack of medical staff and medical equipment. Machine learning has the potential to play an important role in a different area of medical facilities. With the help of the machine learning model, an effective diagnostic tool can be built which helps in the time of scarcity of medical staff. However medical data is imbalanced and this skew nature of data prevent machine learning algorithm from achieving high accuracy. To deal with this problem of imbalanced data, we proposed a modified decision tree algorithm that uses a minority sensitive Gini index called Mgini. In an imbalanced dataset of COVID-19, it is important to focus on the reduction of overall misclassification cost instead of trying improvement in accuracy value. Mgini is useful splitting criteria when the misclassification cost of the minority sample is huge as compared to the majority class. The use of this proposed new Gini index as a splitting criterion in the decision tree reduces the misclassification cost. Mgini based decision tree has higher accuracy and low misclassification cost as compare to the traditional Gini index based CART algorithm. Our proposed cost-sensitive approach improves imbalanced data classification without the use of data level sampling techniques.
期刊介绍:
The Journal of Information Science and Engineering is dedicated to the dissemination of information on computer science, computer engineering, and computer systems. This journal encourages articles on original research in the areas of computer hardware, software, man-machine interface, theory and applications. tutorial papers in the above-mentioned areas, and state-of-the-art papers on various aspects of computer systems and applications.