{"title":"The impact of Under-Sampling Techniques on Classification Accuracy in multi-class Imbalance Data","authors":"Suwanto Sanjaya, Rahmad Abdillah, Iis Afrianty","doi":"10.1109/IConEEI55709.2022.9972265","DOIUrl":null,"url":null,"abstract":"Class imbalance can make the performance of a classification technique problematic. The main problem of class imbalance can be solved by using an Under-Sampling technique, namely adjusting the amount of data in the majority class to the minority class (removing some data in the majority class). However, many studies do not explain the impact of the Under-Sampling technique on the performance of classification techniques. Our study uses LVQ3 and K-fold Cross-Validation to prove this issue. LVQ3 is used to classify and k-fold cross-validation to perform classification performance tests. The research parameters used were learning rate (0.00001, 0.0001, 0.001, 0.01 and 0.1), window (0.001 and 0.2), n-prototype (10, 13, 26, 41 and 46), epoch 2000 and epsilon 0.2. The results showed a significant difference in accuracy when using old and new data. This research suggests that the balanced distribution of the data, the experimental setting, and differences in data sampling affect the accuracy. As a result, data not used in the technique becomes useless. However, the data cannot be said to be useless, especially regarding accuracy.","PeriodicalId":382763,"journal":{"name":"2022 3rd International Conference on Electrical Engineering and Informatics (ICon EEI)","volume":"111 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 3rd International Conference on Electrical Engineering and Informatics (ICon EEI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IConEEI55709.2022.9972265","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Class imbalance can make the performance of a classification technique problematic. The main problem of class imbalance can be solved by using an Under-Sampling technique, namely adjusting the amount of data in the majority class to the minority class (removing some data in the majority class). However, many studies do not explain the impact of the Under-Sampling technique on the performance of classification techniques. Our study uses LVQ3 and K-fold Cross-Validation to prove this issue. LVQ3 is used to classify and k-fold cross-validation to perform classification performance tests. The research parameters used were learning rate (0.00001, 0.0001, 0.001, 0.01 and 0.1), window (0.001 and 0.2), n-prototype (10, 13, 26, 41 and 46), epoch 2000 and epsilon 0.2. The results showed a significant difference in accuracy when using old and new data. This research suggests that the balanced distribution of the data, the experimental setting, and differences in data sampling affect the accuracy. As a result, data not used in the technique becomes useless. However, the data cannot be said to be useless, especially regarding accuracy.