{"title":"Hybrid Method of Undersampling and Oversampling for Handling Imbalanced Data","authors":"Shabrina Choirunnisa, Joko Lianto","doi":"10.1109/ISRITI.2018.8864335","DOIUrl":null,"url":null,"abstract":"Imbalance of data occurs in various kinds of data including natural imbalanced data. If the computation process of the imbalanced data is carried out (for example clustering), the data imbalance has the potential to cause misclassification because the majority data is more dominant on minority data which results in a decrease in accuracy. The combination method of oversampling and undersampling can be one solution in solving imbalance cases. This study aims to address the problem of imbalanced data by combining the oversampling method with the undersampling method to obtain more representative synthetic data. In this study, the undersampling methods used is Neighborhood Cleaning Rules (NCL. While Adaptive Semiunsupervised Weighted Oversampling (A-SUWO) will be used as the oversampling method. After the undersampling and oversampling process is carried out, the data will be classified using the Decision Tree C4.5 and Random Forest algorithm. Performance evaluation will be processed using the calculation of precision, recall, F-measure and accuracy.","PeriodicalId":162781,"journal":{"name":"2018 International Seminar on Research of Information Technology and Intelligent Systems (ISRITI)","volume":"94 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 International Seminar on Research of Information Technology and Intelligent Systems (ISRITI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISRITI.2018.8864335","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 12
Abstract
Imbalance of data occurs in various kinds of data including natural imbalanced data. If the computation process of the imbalanced data is carried out (for example clustering), the data imbalance has the potential to cause misclassification because the majority data is more dominant on minority data which results in a decrease in accuracy. The combination method of oversampling and undersampling can be one solution in solving imbalance cases. This study aims to address the problem of imbalanced data by combining the oversampling method with the undersampling method to obtain more representative synthetic data. In this study, the undersampling methods used is Neighborhood Cleaning Rules (NCL. While Adaptive Semiunsupervised Weighted Oversampling (A-SUWO) will be used as the oversampling method. After the undersampling and oversampling process is carried out, the data will be classified using the Decision Tree C4.5 and Random Forest algorithm. Performance evaluation will be processed using the calculation of precision, recall, F-measure and accuracy.