{"title":"Generative oversampling method (GenOMe) for imbalanced data on apnea detection using ECG data","authors":"H. Sanabila, Ilham Kusuma, W. Jatmiko","doi":"10.1109/ICACSIS.2016.7872805","DOIUrl":null,"url":null,"abstract":"One of machine learning problem that is difficult but important to be addressed is imbalanced data where particular data is recessive while the others are dominant. Most of classifiers performance significantly degraded when dealing with imbalanced data. The major approaches to tackle imbalanced data are cost sensitive learning which modifies the classifier and resampling which modifies the data distribution. In this research, we employed generated oversampling method (GenOMe) that generate new data point with a particular distribution as a constraint. We examine three distribution functions: Beta, Gamma, and Gaussian distribution. We use Logistic Regression, Support Vector Machine (SVM), and Naive Bayes as classifier to assure the robustness of GenOMe. The experimental results shows that GenOMe outperforms classification using original data and classification using SMOTe (Synthetic Minority Oversampling Technique) data.","PeriodicalId":267924,"journal":{"name":"2016 International Conference on Advanced Computer Science and Information Systems (ICACSIS)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 International Conference on Advanced Computer Science and Information Systems (ICACSIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICACSIS.2016.7872805","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7
Abstract
One of machine learning problem that is difficult but important to be addressed is imbalanced data where particular data is recessive while the others are dominant. Most of classifiers performance significantly degraded when dealing with imbalanced data. The major approaches to tackle imbalanced data are cost sensitive learning which modifies the classifier and resampling which modifies the data distribution. In this research, we employed generated oversampling method (GenOMe) that generate new data point with a particular distribution as a constraint. We examine three distribution functions: Beta, Gamma, and Gaussian distribution. We use Logistic Regression, Support Vector Machine (SVM), and Naive Bayes as classifier to assure the robustness of GenOMe. The experimental results shows that GenOMe outperforms classification using original data and classification using SMOTe (Synthetic Minority Oversampling Technique) data.