Predicting customer churn: A systematic literature review

IF 1.1 Q2 MATHEMATICS, APPLIED

JOURNAL OF DISCRETE MATHEMATICAL SCIENCES & CRYPTOGRAPHY Pub Date : 2022-10-03 DOI:10.1080/09720529.2022.2133238

Soumi De, P. Prabu

{"title":"Predicting customer churn: A systematic literature review","authors":"Soumi De, P. Prabu","doi":"10.1080/09720529.2022.2133238","DOIUrl":null,"url":null,"abstract":"Abstract Churn prediction is an active topic for research and machine learning approaches have made significant contributions in this domain. Models built to address customer churn, aim to identify customers who are at a high risk of terminating services offered by a company. Hence, an effective machine learning model indirectly contributes to the revenue growth of an organization, by identifying “at risk” customers, well in advance. This improves the success rate of retention campaigns and reduces costs associated with churn. The aim of this study is to explore the state-of-the-art machine learning techniques used in churn prediction. A systematic literature review, that is driven by 5 research questions and rigorous quality assessment criteria, is presented. There are 38 primary studies that are selected out of 420 studies published between 2018 and 2021. The review identifies popular machine learning techniques used in churn prediction and provides directions for future research. Firstly, the study finds that churn models lack generalization capability across industry domains. Hence, it identifies a need for researchers to explore techniques that extend beyond model experimentation, to improve efficiency of classifiers across domains. Secondly, it is observed that the traditional approaches to churn prediction depend significantly on demographic, product-usage, and revenue features alone. However, recent papers have integrated social network analysis-related features in churn models and achieved satisfactory results. Furthermore, there is a lack of scientific work that utilizes information-rich content of customer-company-interaction instances via email, chat conversations and other means. This area is the least explored. Thirdly, there is scope to investigate the effect of hybrid sampling strategies on model performance. This has not been extensively evaluated in literature. Lastly, there is no formal guideline on correct evaluation parameters to be used for models applied on imbalanced churn datasets. This is a grey area that requires greater attention.","PeriodicalId":46563,"journal":{"name":"JOURNAL OF DISCRETE MATHEMATICAL SCIENCES & CRYPTOGRAPHY","volume":"25 1","pages":"1965 - 1985"},"PeriodicalIF":1.1000,"publicationDate":"2022-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JOURNAL OF DISCRETE MATHEMATICAL SCIENCES & CRYPTOGRAPHY","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/09720529.2022.2133238","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MATHEMATICS, APPLIED","Score":null,"Total":0}

引用次数: 0

Abstract

Abstract Churn prediction is an active topic for research and machine learning approaches have made significant contributions in this domain. Models built to address customer churn, aim to identify customers who are at a high risk of terminating services offered by a company. Hence, an effective machine learning model indirectly contributes to the revenue growth of an organization, by identifying “at risk” customers, well in advance. This improves the success rate of retention campaigns and reduces costs associated with churn. The aim of this study is to explore the state-of-the-art machine learning techniques used in churn prediction. A systematic literature review, that is driven by 5 research questions and rigorous quality assessment criteria, is presented. There are 38 primary studies that are selected out of 420 studies published between 2018 and 2021. The review identifies popular machine learning techniques used in churn prediction and provides directions for future research. Firstly, the study finds that churn models lack generalization capability across industry domains. Hence, it identifies a need for researchers to explore techniques that extend beyond model experimentation, to improve efficiency of classifiers across domains. Secondly, it is observed that the traditional approaches to churn prediction depend significantly on demographic, product-usage, and revenue features alone. However, recent papers have integrated social network analysis-related features in churn models and achieved satisfactory results. Furthermore, there is a lack of scientific work that utilizes information-rich content of customer-company-interaction instances via email, chat conversations and other means. This area is the least explored. Thirdly, there is scope to investigate the effect of hybrid sampling strategies on model performance. This has not been extensively evaluated in literature. Lastly, there is no formal guideline on correct evaluation parameters to be used for models applied on imbalanced churn datasets. This is a grey area that requires greater attention.

查看原文本刊更多论文

预测客户流失:系统的文献回顾

流失预测是一个活跃的研究课题，机器学习方法在这一领域做出了重大贡献。为解决客户流失而建立的模型，旨在识别那些有高风险终止公司提供服务的客户。因此，有效的机器学习模型通过提前识别“有风险”的客户，间接地促进了组织的收入增长。这提高了留存率活动的成功率，并减少了与流失相关的成本。本研究的目的是探索在流失预测中使用的最先进的机器学习技术。在5个研究问题和严格的质量评估标准的驱动下，提出了系统的文献综述。从2018年至2021年发表的420项研究中选出了38项初步研究。该综述确定了在客户流失预测中使用的流行机器学习技术，并为未来的研究提供了方向。首先，研究发现流失模型缺乏跨行业领域的泛化能力。因此，它确定了研究人员需要探索超越模型实验的技术，以提高跨领域分类器的效率。其次，传统的流失预测方法主要依赖于人口统计、产品使用和收入特征。然而，最近的论文将社会网络分析的相关特征整合到流失模型中，并取得了令人满意的结果。此外，缺乏通过电子邮件、聊天对话等方式利用客户-公司互动实例中信息丰富的内容的科学工作。这个地区是最少被探索的。第三，混合采样策略对模型性能的影响还有待进一步研究。这在文献中还没有得到广泛的评价。最后，对于应用于不平衡客户流失数据集的模型，没有关于正确评估参数的正式指南。这是一个需要更多关注的灰色地带。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

JOURNAL OF DISCRETE MATHEMATICAL SCIENCES & CRYPTOGRAPHY MATHEMATICS, APPLIED-

CiteScore

3.10

自引率

21.40%

发文量

126