{"title":"Handling Class Imbalance in Customer Churn Prediction in Telecom Sector Using Sampling Techniques, Bagging and Boosting Trees","authors":"Sajjad Shumaly, Pedram Neysaryan, Yanhui Guo","doi":"10.1109/ICCKE50421.2020.9303698","DOIUrl":null,"url":null,"abstract":"Customer churn is a serious problem in the telecommunications industry and occurs more often. The cost of maintaining existing customers is much lower than attracting new customers, and the literature stated that five times the cost of maintaining existing customers have to be spent on attracting new customers. In this article, we have identified customers who intend to stop using the organization's services. One of the most important problems in predicting customer churn is the imbalanced data, which has been tried to be solved and compared with different methods. The machine learning algorithms used in this paper are Decision Tree, Support Vector Machine, Multi-Layer Perceptron, Random Forest, and Gradient Boosting. Data was balanced by random over-sampling, random under-sampling and SMOTE methods. The methods of over-sampling and under-sampling had appropriate and almost similar results in terms of the area under the receiver character curve (AUC) index, the method of under-sampling has shown the better specificity, and the method over-sampling has shown the better sensitivity. Also, the performance of random forest and gradient boosting algorithms were better than other algorithms.","PeriodicalId":402043,"journal":{"name":"2020 10th International Conference on Computer and Knowledge Engineering (ICCKE)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 10th International Conference on Computer and Knowledge Engineering (ICCKE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCKE50421.2020.9303698","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Customer churn is a serious problem in the telecommunications industry and occurs more often. The cost of maintaining existing customers is much lower than attracting new customers, and the literature stated that five times the cost of maintaining existing customers have to be spent on attracting new customers. In this article, we have identified customers who intend to stop using the organization's services. One of the most important problems in predicting customer churn is the imbalanced data, which has been tried to be solved and compared with different methods. The machine learning algorithms used in this paper are Decision Tree, Support Vector Machine, Multi-Layer Perceptron, Random Forest, and Gradient Boosting. Data was balanced by random over-sampling, random under-sampling and SMOTE methods. The methods of over-sampling and under-sampling had appropriate and almost similar results in terms of the area under the receiver character curve (AUC) index, the method of under-sampling has shown the better specificity, and the method over-sampling has shown the better sensitivity. Also, the performance of random forest and gradient boosting algorithms were better than other algorithms.