{"title":"Applying active learning strategy to classify large scale data with imbalanced classes","authors":"Phairod Tuntiwachiratrakun, P. Vateekul","doi":"10.1109/ICCAIS.2016.7822443","DOIUrl":null,"url":null,"abstract":"Nowadays, classification tasks are very challenging because data is usually large and imbalanced. They can cause low prediction accuracy and high computation costs. Active Learning is a technique that employs only a small set of data to construct an initial classification model. Then, it iteratively improves the model by incrementally learning from the misclassified examples. In this paper, we aim to improve prediction accuracy by applying Active Learning. To solve the imbalance issue, the active model was iteratively updated based on the G-mean, and the under sampling sampling was also applied. The proposed algorithm was suitable for large scale data since it did not need to use the whole data set to construct a model. The experiment was conducted on two standard corpuses, one of which contained more than 100,000 examples. The result showed that a prediction performance of standard technique (Neural Network) can be improved by applying the Active Learning strategy for 5%–13%. Furthermore, this technique also outperformed other classical classification algorithms including K-nearest neighbors (kNN), Support Vector Machine (SVM), Decision Tree (DT), Naïve Bayes (NB) and Artificial Neural Network (ANN).","PeriodicalId":407031,"journal":{"name":"2016 International Conference on Control, Automation and Information Sciences (ICCAIS)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 International Conference on Control, Automation and Information Sciences (ICCAIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCAIS.2016.7822443","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Nowadays, classification tasks are very challenging because data is usually large and imbalanced. They can cause low prediction accuracy and high computation costs. Active Learning is a technique that employs only a small set of data to construct an initial classification model. Then, it iteratively improves the model by incrementally learning from the misclassified examples. In this paper, we aim to improve prediction accuracy by applying Active Learning. To solve the imbalance issue, the active model was iteratively updated based on the G-mean, and the under sampling sampling was also applied. The proposed algorithm was suitable for large scale data since it did not need to use the whole data set to construct a model. The experiment was conducted on two standard corpuses, one of which contained more than 100,000 examples. The result showed that a prediction performance of standard technique (Neural Network) can be improved by applying the Active Learning strategy for 5%–13%. Furthermore, this technique also outperformed other classical classification algorithms including K-nearest neighbors (kNN), Support Vector Machine (SVM), Decision Tree (DT), Naïve Bayes (NB) and Artificial Neural Network (ANN).