{"title":"Handling class imbalance in direct marketing dataset using a hybrid data and algorithmic level solutions","authors":"M. M. al-Rifaie, H. Alhakbani","doi":"10.1109/SAI.2016.7556019","DOIUrl":null,"url":null,"abstract":"Class imbalance is a major problem in machine learning. It occurs when the number of instances in the majority class is significantly more than the number of instances in the minority class. This is a common problem which is recurring in most datasets, including the one used in this paper (i.e. direct marketing dataset). In direct marketing, businesses are interested in identifying potential buyers, or charities wish to identify potential givers. Several solutions have been suggested in the literature to address this problem, amongst which are data-level techniques, algorithmic-level techniques and a combination of both. In this paper, a model is proposed to solve imbalanced data using a Hybrid of Data-level and Algorithmic-level solutions (HybridDA), which involves oversampling the minority class, undersampling the majority class, and additionally, optimising the cost parameter, the gamma and the kernel type of Support Vector Machines (SVM) using a grid search. The proposed model perfomed competitively compared with other models on the same dataset. The dataset used in this work are real-world data collected from a Portuguese marketing campaign for bank-deposit subscriptions and are available from the University of California, Irvine (UCI) Machine Learning Repository.","PeriodicalId":219896,"journal":{"name":"2016 SAI Computing Conference (SAI)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 SAI Computing Conference (SAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SAI.2016.7556019","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 15
Abstract
Class imbalance is a major problem in machine learning. It occurs when the number of instances in the majority class is significantly more than the number of instances in the minority class. This is a common problem which is recurring in most datasets, including the one used in this paper (i.e. direct marketing dataset). In direct marketing, businesses are interested in identifying potential buyers, or charities wish to identify potential givers. Several solutions have been suggested in the literature to address this problem, amongst which are data-level techniques, algorithmic-level techniques and a combination of both. In this paper, a model is proposed to solve imbalanced data using a Hybrid of Data-level and Algorithmic-level solutions (HybridDA), which involves oversampling the minority class, undersampling the majority class, and additionally, optimising the cost parameter, the gamma and the kernel type of Support Vector Machines (SVM) using a grid search. The proposed model perfomed competitively compared with other models on the same dataset. The dataset used in this work are real-world data collected from a Portuguese marketing campaign for bank-deposit subscriptions and are available from the University of California, Irvine (UCI) Machine Learning Repository.