{"title":"A Gaussian mixture based boosted classification scheme for imbalanced and oversampled data","authors":"B. Pal, Mahit Kumar Paul","doi":"10.1109/ECACE.2017.7912938","DOIUrl":null,"url":null,"abstract":"Dataset with imbalanced class distribution used to abate classification performance for most of the standard classifier learning algorithms. Moreover, some application area consists of scarcity of labeled training data where clustering is most prominent way to support classification process. Gaussian Mixture Model (GMM) being able to approximate arbitrary probability distribution, is a dominant tool for classification in such cases by means of clustering. An ensemble approach is presented in this paper considering GMM as a weak learner to boost the GMMs in a semi supervised manner via Adaptive Boosting technique. This paper, firstly investigates how much K-means and GMM suffers from uneven class distribution in data. Later experiment on benchmark imbalanced datasets with different imbalance ratio and over sampled datasets using Synthetic Minority Over-sampling Technique (SMOTE) has been carried out for proposed approach. For each case cluster forest has been used as an attribute selection technique. Efficacy of the proposed Boosted GMM approach compared to standard clustering approaches like K means and GMM is exhibited from empirical analysis.","PeriodicalId":333370,"journal":{"name":"2017 International Conference on Electrical, Computer and Communication Engineering (ECCE)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2017-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 International Conference on Electrical, Computer and Communication Engineering (ECCE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ECACE.2017.7912938","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
Dataset with imbalanced class distribution used to abate classification performance for most of the standard classifier learning algorithms. Moreover, some application area consists of scarcity of labeled training data where clustering is most prominent way to support classification process. Gaussian Mixture Model (GMM) being able to approximate arbitrary probability distribution, is a dominant tool for classification in such cases by means of clustering. An ensemble approach is presented in this paper considering GMM as a weak learner to boost the GMMs in a semi supervised manner via Adaptive Boosting technique. This paper, firstly investigates how much K-means and GMM suffers from uneven class distribution in data. Later experiment on benchmark imbalanced datasets with different imbalance ratio and over sampled datasets using Synthetic Minority Over-sampling Technique (SMOTE) has been carried out for proposed approach. For each case cluster forest has been used as an attribute selection technique. Efficacy of the proposed Boosted GMM approach compared to standard clustering approaches like K means and GMM is exhibited from empirical analysis.