Churn Prediction in Telecom Industry using Machine Learning Ensembles with Class Balancing

Abdullahi Chowdhury, Shahriar Kaisar, M. Rashid, Sakib Shahriar Shafin, J. Kamruzzaman
{"title":"Churn Prediction in Telecom Industry using Machine Learning Ensembles with Class Balancing","authors":"Abdullahi Chowdhury, Shahriar Kaisar, M. Rashid, Sakib Shahriar Shafin, J. Kamruzzaman","doi":"10.1109/CSDE53843.2021.9718498","DOIUrl":null,"url":null,"abstract":"Telecommunication service providers are going through a very competitive and challenging time to retain existing customers by offering new and attractive services (e.g., unlimited local and international calls, high-speed internet, new phones). It is therefore imperative to analyse and predict customer churn behaviour more accurately. One of the major challenges to analyse churn data and build better prediction model is the imbalance nature of the data. Customer behaviour for churn and non-churn scenarios may contain resembling features. Using a single classifier or simple oversampling method to handle data imbalance often struggles to identify the minority (churn) class data. To overcome the issue, we introduce a model that uses sophisticated oversampling technique in conjunction with ensemble methods, namely Random Forest, Gradient Boost, Extreme Gradient Boost, and AdaBoost. The hyperparameters of the baseline ensemble methods and the oversampling methods were tuned in several ways to investigate their impact on prediction performances. Using a widely used publicly available customer churn dataset, prediction performance of the proposed model was evaluated in term of various metrics, namely, accuracy, precision, recall, F-1 score, AUC under ROC curve. Our model outperformed the existing models and significantly reduced both false positive and false negative prediction.","PeriodicalId":166950,"journal":{"name":"2021 IEEE Asia-Pacific Conference on Computer Science and Data Engineering (CSDE)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE Asia-Pacific Conference on Computer Science and Data Engineering (CSDE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CSDE53843.2021.9718498","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Telecommunication service providers are going through a very competitive and challenging time to retain existing customers by offering new and attractive services (e.g., unlimited local and international calls, high-speed internet, new phones). It is therefore imperative to analyse and predict customer churn behaviour more accurately. One of the major challenges to analyse churn data and build better prediction model is the imbalance nature of the data. Customer behaviour for churn and non-churn scenarios may contain resembling features. Using a single classifier or simple oversampling method to handle data imbalance often struggles to identify the minority (churn) class data. To overcome the issue, we introduce a model that uses sophisticated oversampling technique in conjunction with ensemble methods, namely Random Forest, Gradient Boost, Extreme Gradient Boost, and AdaBoost. The hyperparameters of the baseline ensemble methods and the oversampling methods were tuned in several ways to investigate their impact on prediction performances. Using a widely used publicly available customer churn dataset, prediction performance of the proposed model was evaluated in term of various metrics, namely, accuracy, precision, recall, F-1 score, AUC under ROC curve. Our model outperformed the existing models and significantly reduced both false positive and false negative prediction.
基于类平衡的机器学习集成的电信行业流失预测
电信服务供应商正在通过提供新的和有吸引力的服务(例如,无限制的本地和国际电话,高速互联网,新手机)来保持现有客户,这是一个非常竞争和具有挑战性的时期。因此,必须更准确地分析和预测客户流失行为。分析流失数据并建立更好的预测模型的主要挑战之一是数据的不平衡性。客户流失和非流失情况下的客户行为可能包含类似的特征。使用单一分类器或简单的过采样方法来处理数据不平衡通常难以识别少数(流失)类数据。为了克服这个问题,我们引入了一个模型,该模型结合了复杂的过采样技术和集成方法,即随机森林、梯度增强、极端梯度增强和AdaBoost。对基线集成方法和过采样方法的超参数进行了调整,研究了它们对预测性能的影响。使用广泛使用的公开客户流失数据集,根据各种指标,即准确性,精密度,召回率,F-1分数,ROC曲线下的AUC,对所提出模型的预测性能进行了评估。我们的模型优于现有模型,显著减少了假阳性和假阴性预测。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信