{"title":"Intrusion Detection based Sample Selection for imbalanced data distribution","authors":"Ikram Chairi, Souad Alaoui, A. Lyhyaoui","doi":"10.1109/INTECH.2012.6457778","DOIUrl":null,"url":null,"abstract":"The majority of learning systems usually assume that training sets are balanced, however, in real world data this hypothesis is not always true. The problem of between-class imbalance is a challenge that has attracted growing attention from both academia and industry, because of its critical influence on the performance of learning systems. Many solutions were proposed to resolve this problem: Generally, the common practice for dealing with imbalanced data sets is to rebalance them artificially by using sampling methods. In this paper, we propose a method based on Sample Selection (SS), to deal with the problem of between class imbalance. We consider that creating balance between classes by paying more attention to those examples located near the border line improves the performance of the classifier. To reduce the computational cost of selecting samples, we propose a clustering method as a first step in order to determine the critical centers, and then we select samples from those critical clusters. Experimental results with Multi-Layer Perceptron (MLP) architecture, on well known Intrusion Detection data set, support the usefulness of our approach.","PeriodicalId":369113,"journal":{"name":"Second International Conference on the Innovative Computing Technology (INTECH 2012)","volume":"103 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Second International Conference on the Innovative Computing Technology (INTECH 2012)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INTECH.2012.6457778","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8
Abstract
The majority of learning systems usually assume that training sets are balanced, however, in real world data this hypothesis is not always true. The problem of between-class imbalance is a challenge that has attracted growing attention from both academia and industry, because of its critical influence on the performance of learning systems. Many solutions were proposed to resolve this problem: Generally, the common practice for dealing with imbalanced data sets is to rebalance them artificially by using sampling methods. In this paper, we propose a method based on Sample Selection (SS), to deal with the problem of between class imbalance. We consider that creating balance between classes by paying more attention to those examples located near the border line improves the performance of the classifier. To reduce the computational cost of selecting samples, we propose a clustering method as a first step in order to determine the critical centers, and then we select samples from those critical clusters. Experimental results with Multi-Layer Perceptron (MLP) architecture, on well known Intrusion Detection data set, support the usefulness of our approach.