Behzad Mirzaei, Bahareh Nikpour, H. Nezamabadi-pour
{"title":"An under-sampling technique for imbalanced data classification based on DBSCAN algorithm","authors":"Behzad Mirzaei, Bahareh Nikpour, H. Nezamabadi-pour","doi":"10.1109/CFIS49607.2020.9238718","DOIUrl":null,"url":null,"abstract":"In the classification problem, the classification accuracy will be influenced by the training data significantly. However, data sets distribution in real-world applications, is mostly imbalanced. Imbalanced data sets mean that most of the samples are in one class named the majority class, whereas the other class named the minority class has little samples. In these situations, most of the classifiers confront the problem, because they designed to classify samples that are distributed between classes equally. Therefore, selecting a suitable training set is an essential step in the domain of imbalanced data classification. In this paper, a novel and effective under-sampling technique is presented to select the suitable samples of majority class using the well-known DBSCAN algorithm. According to this algorithm, the most appropriate samples from the majority class are selected, and other majority class samples will be removed to balance the training set. Experimental results over fifteen imbalanced data sets demonstrate the supremacy of the proposed method compared with six other preprocessing methods.","PeriodicalId":128323,"journal":{"name":"2020 8th Iranian Joint Congress on Fuzzy and intelligent Systems (CFIS)","volume":"441 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 8th Iranian Joint Congress on Fuzzy and intelligent Systems (CFIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CFIS49607.2020.9238718","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
In the classification problem, the classification accuracy will be influenced by the training data significantly. However, data sets distribution in real-world applications, is mostly imbalanced. Imbalanced data sets mean that most of the samples are in one class named the majority class, whereas the other class named the minority class has little samples. In these situations, most of the classifiers confront the problem, because they designed to classify samples that are distributed between classes equally. Therefore, selecting a suitable training set is an essential step in the domain of imbalanced data classification. In this paper, a novel and effective under-sampling technique is presented to select the suitable samples of majority class using the well-known DBSCAN algorithm. According to this algorithm, the most appropriate samples from the majority class are selected, and other majority class samples will be removed to balance the training set. Experimental results over fifteen imbalanced data sets demonstrate the supremacy of the proposed method compared with six other preprocessing methods.