{"title":"Predicting customer churn: A systematic literature review","authors":"Soumi De, P. Prabu","doi":"10.1080/09720529.2022.2133238","DOIUrl":null,"url":null,"abstract":"Abstract Churn prediction is an active topic for research and machine learning approaches have made significant contributions in this domain. Models built to address customer churn, aim to identify customers who are at a high risk of terminating services offered by a company. Hence, an effective machine learning model indirectly contributes to the revenue growth of an organization, by identifying “at risk” customers, well in advance. This improves the success rate of retention campaigns and reduces costs associated with churn. The aim of this study is to explore the state-of-the-art machine learning techniques used in churn prediction. A systematic literature review, that is driven by 5 research questions and rigorous quality assessment criteria, is presented. There are 38 primary studies that are selected out of 420 studies published between 2018 and 2021. The review identifies popular machine learning techniques used in churn prediction and provides directions for future research. Firstly, the study finds that churn models lack generalization capability across industry domains. Hence, it identifies a need for researchers to explore techniques that extend beyond model experimentation, to improve efficiency of classifiers across domains. Secondly, it is observed that the traditional approaches to churn prediction depend significantly on demographic, product-usage, and revenue features alone. However, recent papers have integrated social network analysis-related features in churn models and achieved satisfactory results. Furthermore, there is a lack of scientific work that utilizes information-rich content of customer-company-interaction instances via email, chat conversations and other means. This area is the least explored. Thirdly, there is scope to investigate the effect of hybrid sampling strategies on model performance. This has not been extensively evaluated in literature. Lastly, there is no formal guideline on correct evaluation parameters to be used for models applied on imbalanced churn datasets. This is a grey area that requires greater attention.","PeriodicalId":46563,"journal":{"name":"JOURNAL OF DISCRETE MATHEMATICAL SCIENCES & CRYPTOGRAPHY","volume":"25 1","pages":"1965 - 1985"},"PeriodicalIF":1.2000,"publicationDate":"2022-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JOURNAL OF DISCRETE MATHEMATICAL SCIENCES & CRYPTOGRAPHY","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/09720529.2022.2133238","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MATHEMATICS, APPLIED","Score":null,"Total":0}
引用次数: 0
Abstract
Abstract Churn prediction is an active topic for research and machine learning approaches have made significant contributions in this domain. Models built to address customer churn, aim to identify customers who are at a high risk of terminating services offered by a company. Hence, an effective machine learning model indirectly contributes to the revenue growth of an organization, by identifying “at risk” customers, well in advance. This improves the success rate of retention campaigns and reduces costs associated with churn. The aim of this study is to explore the state-of-the-art machine learning techniques used in churn prediction. A systematic literature review, that is driven by 5 research questions and rigorous quality assessment criteria, is presented. There are 38 primary studies that are selected out of 420 studies published between 2018 and 2021. The review identifies popular machine learning techniques used in churn prediction and provides directions for future research. Firstly, the study finds that churn models lack generalization capability across industry domains. Hence, it identifies a need for researchers to explore techniques that extend beyond model experimentation, to improve efficiency of classifiers across domains. Secondly, it is observed that the traditional approaches to churn prediction depend significantly on demographic, product-usage, and revenue features alone. However, recent papers have integrated social network analysis-related features in churn models and achieved satisfactory results. Furthermore, there is a lack of scientific work that utilizes information-rich content of customer-company-interaction instances via email, chat conversations and other means. This area is the least explored. Thirdly, there is scope to investigate the effect of hybrid sampling strategies on model performance. This has not been extensively evaluated in literature. Lastly, there is no formal guideline on correct evaluation parameters to be used for models applied on imbalanced churn datasets. This is a grey area that requires greater attention.