{"title":"Real Time Customer Churn Scoring Model for the Telecommunications Industry","authors":"Nyashadzashe Tamuka, K. Sibanda","doi":"10.1109/IMITEC50163.2020.9334129","DOIUrl":null,"url":null,"abstract":"There are two types of customers in the telecommunication industry; the pre-paid and the contract customers. In South Africa it is the pre-paid customers that keep telcos constantly worried because such customers do not have anything binding them to the company, they can leave and join a competitor at any time. To retain such customers, telcos need to customise suitable solutions especially for those customers that are agitating and can churn at any time. This needs customer churn prediction models that would take advantage of big data analytics and provide the telco industry with a real time solution. The purpose of this study was to develop a real time customer churn prediction model. The study used the CRISP-DM methodology and the three machine learning algorithms for implementation. Watson Studio software was used for the model prototype deployment. The study used the confusion matrix to unpack a number of performance measures. The results showed that all the models had some degree of misclassification, however the misclassification rate of the Logistic Regression was very minimal (2.2%) as differentiated from the Random Forest and the Decision Tree, which had misclassification rates of 20.8% and 21.7% respectively. The results further showed that both Random Forest and the Decision Tree had good accuracy rates of 78.3% and 79.2% respectively, although they were still not better than that of the Logistic Regression. Despite the two having good accuracy rates, they had the highest rates of misclassification of class events. The conclusion we drew from this was that, accuracy is not a dependable measure for determining model performance.","PeriodicalId":349926,"journal":{"name":"2020 2nd International Multidisciplinary Information Technology and Engineering Conference (IMITEC)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 2nd International Multidisciplinary Information Technology and Engineering Conference (IMITEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IMITEC50163.2020.9334129","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
There are two types of customers in the telecommunication industry; the pre-paid and the contract customers. In South Africa it is the pre-paid customers that keep telcos constantly worried because such customers do not have anything binding them to the company, they can leave and join a competitor at any time. To retain such customers, telcos need to customise suitable solutions especially for those customers that are agitating and can churn at any time. This needs customer churn prediction models that would take advantage of big data analytics and provide the telco industry with a real time solution. The purpose of this study was to develop a real time customer churn prediction model. The study used the CRISP-DM methodology and the three machine learning algorithms for implementation. Watson Studio software was used for the model prototype deployment. The study used the confusion matrix to unpack a number of performance measures. The results showed that all the models had some degree of misclassification, however the misclassification rate of the Logistic Regression was very minimal (2.2%) as differentiated from the Random Forest and the Decision Tree, which had misclassification rates of 20.8% and 21.7% respectively. The results further showed that both Random Forest and the Decision Tree had good accuracy rates of 78.3% and 79.2% respectively, although they were still not better than that of the Logistic Regression. Despite the two having good accuracy rates, they had the highest rates of misclassification of class events. The conclusion we drew from this was that, accuracy is not a dependable measure for determining model performance.