{"title":"基于地图约简的大数据分类不平衡问题混合方法的实证评价","authors":"Khyati Ahlawat, A. Chug, A. Singh","doi":"10.4018/IJGHPC.2019070102","DOIUrl":null,"url":null,"abstract":"Imbalanced datasets are the ones with uneven distribution of classes that deteriorates classifier's performance. In this paper, SVM classifier is combined with K-Means clustering approach and a hybrid approach, Hy_SVM_KM is introduced. The performance of proposed method is also empirically evaluated using Accuracy and FN Rate measure and compared with existing methods like SMOTE. The results have shown that the proposed hybrid technique has outperformed traditional machine learning classifier SVM in mostly datasets and have performed better than known pre-processing technique SMOTE for all datasets. The goal of this article is to extend capabilities of popular machine learning algorithms and adapt it to meet the challenges of imbalanced big data classification. This article can provide a baseline study for future research on imbalanced big datasets classification and provides an efficient mechanism to deal with imbalanced nature big dataset with modified SVM classifier and improves the overall performance of the model.","PeriodicalId":43565,"journal":{"name":"International Journal of Grid and High Performance Computing","volume":"17 1","pages":"23-45"},"PeriodicalIF":0.6000,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Empirical Evaluation of Map Reduce Based Hybrid Approach for Problem of Imbalanced Classification in Big Data\",\"authors\":\"Khyati Ahlawat, A. Chug, A. Singh\",\"doi\":\"10.4018/IJGHPC.2019070102\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Imbalanced datasets are the ones with uneven distribution of classes that deteriorates classifier's performance. In this paper, SVM classifier is combined with K-Means clustering approach and a hybrid approach, Hy_SVM_KM is introduced. The performance of proposed method is also empirically evaluated using Accuracy and FN Rate measure and compared with existing methods like SMOTE. The results have shown that the proposed hybrid technique has outperformed traditional machine learning classifier SVM in mostly datasets and have performed better than known pre-processing technique SMOTE for all datasets. The goal of this article is to extend capabilities of popular machine learning algorithms and adapt it to meet the challenges of imbalanced big data classification. This article can provide a baseline study for future research on imbalanced big datasets classification and provides an efficient mechanism to deal with imbalanced nature big dataset with modified SVM classifier and improves the overall performance of the model.\",\"PeriodicalId\":43565,\"journal\":{\"name\":\"International Journal of Grid and High Performance Computing\",\"volume\":\"17 1\",\"pages\":\"23-45\"},\"PeriodicalIF\":0.6000,\"publicationDate\":\"2019-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Grid and High Performance Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.4018/IJGHPC.2019070102\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Grid and High Performance Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4018/IJGHPC.2019070102","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
Empirical Evaluation of Map Reduce Based Hybrid Approach for Problem of Imbalanced Classification in Big Data
Imbalanced datasets are the ones with uneven distribution of classes that deteriorates classifier's performance. In this paper, SVM classifier is combined with K-Means clustering approach and a hybrid approach, Hy_SVM_KM is introduced. The performance of proposed method is also empirically evaluated using Accuracy and FN Rate measure and compared with existing methods like SMOTE. The results have shown that the proposed hybrid technique has outperformed traditional machine learning classifier SVM in mostly datasets and have performed better than known pre-processing technique SMOTE for all datasets. The goal of this article is to extend capabilities of popular machine learning algorithms and adapt it to meet the challenges of imbalanced big data classification. This article can provide a baseline study for future research on imbalanced big datasets classification and provides an efficient mechanism to deal with imbalanced nature big dataset with modified SVM classifier and improves the overall performance of the model.