A data-driven approach with explainable artificial intelligence for customer churn prediction in the telecommunications industry

IF 6 Q1 ENGINEERING, MULTIDISCIPLINARY
Daniyal Asif , Muhammad Shoaib Arif , Aiman Mukheimer
{"title":"A data-driven approach with explainable artificial intelligence for customer churn prediction in the telecommunications industry","authors":"Daniyal Asif ,&nbsp;Muhammad Shoaib Arif ,&nbsp;Aiman Mukheimer","doi":"10.1016/j.rineng.2025.104629","DOIUrl":null,"url":null,"abstract":"<div><div>In the competitive telecommunications industry (TCI), retaining clients is crucial for profitability, as customer churn remains a significant challenge. Traditional machine learning (ML) models often lack the predictive power needed for complex telecom data, while black-box models provide limited transparency, reducing trust and actionable insights. This study introduces XAI-Churn TriBoost, an interpretable and explainable data-driven model developed using a dataset of over 2 million records. The model combines extreme gradient boosting (XGBoost), categorical boosting (CatBoost), and light gradient boosting machine (LightGBM) in a soft voting ensemble to enhance churn prediction. Data preprocessing included handling missing values through iterative imputation with a Bayesian ridge. Sequential data scaling was implemented by combining robust, standard, and min-max scaling methods to ensure feature consistency. Feature selection was conducted using the Boruta technique with a random forest (RF), and class imbalance in the training data was addressed using the synthetic minority oversampling technique (SMOTE). XAI-Churn TriBoost achieved high predictive performance, with an accuracy of 96.44%, precision of 92.82%, recall of 87.82%, and F1 score of 90.25%. To enhance model transparency, we incorporated explainable artificial intelligence (AI) techniques, specifically local interpretable model-agnostic explanations (LIME) and Shapley additive explanations (SHAP), to interpret individual predictions and identify critical features affecting churn. Key factors impacting churn include regularity and montant, offering TCI valuable insights for targeted retention strategies. XAI-Churn TriBoost thus provides both robust performance and interpretability, highlighting its potential to support customer retention efforts in the TCI.</div></div>","PeriodicalId":36919,"journal":{"name":"Results in Engineering","volume":"26 ","pages":"Article 104629"},"PeriodicalIF":6.0000,"publicationDate":"2025-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Results in Engineering","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2590123025007066","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

In the competitive telecommunications industry (TCI), retaining clients is crucial for profitability, as customer churn remains a significant challenge. Traditional machine learning (ML) models often lack the predictive power needed for complex telecom data, while black-box models provide limited transparency, reducing trust and actionable insights. This study introduces XAI-Churn TriBoost, an interpretable and explainable data-driven model developed using a dataset of over 2 million records. The model combines extreme gradient boosting (XGBoost), categorical boosting (CatBoost), and light gradient boosting machine (LightGBM) in a soft voting ensemble to enhance churn prediction. Data preprocessing included handling missing values through iterative imputation with a Bayesian ridge. Sequential data scaling was implemented by combining robust, standard, and min-max scaling methods to ensure feature consistency. Feature selection was conducted using the Boruta technique with a random forest (RF), and class imbalance in the training data was addressed using the synthetic minority oversampling technique (SMOTE). XAI-Churn TriBoost achieved high predictive performance, with an accuracy of 96.44%, precision of 92.82%, recall of 87.82%, and F1 score of 90.25%. To enhance model transparency, we incorporated explainable artificial intelligence (AI) techniques, specifically local interpretable model-agnostic explanations (LIME) and Shapley additive explanations (SHAP), to interpret individual predictions and identify critical features affecting churn. Key factors impacting churn include regularity and montant, offering TCI valuable insights for targeted retention strategies. XAI-Churn TriBoost thus provides both robust performance and interpretability, highlighting its potential to support customer retention efforts in the TCI.
一种数据驱动的方法,可解释的人工智能,用于电信行业的客户流失预测
在竞争激烈的电信行业(TCI)中,留住客户对盈利能力至关重要,因为客户流失仍然是一个重大挑战。传统的机器学习(ML)模型通常缺乏复杂电信数据所需的预测能力,而黑箱模型提供的透明度有限,降低了信任和可操作的见解。该研究引入了XAI-Churn TriBoost,这是一种可解释和可解释的数据驱动模型,使用超过200万条记录的数据集开发。该模型将极端梯度增强(XGBoost)、分类增强(CatBoost)和轻梯度增强机(LightGBM)结合在一个软投票集合中,以增强客户流失预测。数据预处理包括利用贝叶斯脊进行迭代插值处理缺失值。通过结合鲁棒、标准和最小-最大缩放方法实现序列数据缩放,以确保特征一致性。使用随机森林(RF)的Boruta技术进行特征选择,并使用合成少数过采样技术(SMOTE)解决训练数据中的类不平衡问题。XAI-Churn TriBoost取得了较高的预测性能,准确率为96.44%,精密度为92.82%,召回率为87.82%,F1得分为90.25%。为了提高模型的透明度,我们采用了可解释的人工智能(AI)技术,特别是局部可解释的模型不可知论解释(LIME)和Shapley加性解释(SHAP),来解释个人预测并确定影响客户流失的关键特征。影响流失的关键因素包括规律性和持续性,这为TCI提供了有价值的有针对性的留存策略。因此,XAI-Churn TriBoost提供了强大的性能和可解释性,突出了其在TCI中支持客户保留工作的潜力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Results in Engineering
Results in Engineering Engineering-Engineering (all)
CiteScore
5.80
自引率
34.00%
发文量
441
审稿时长
47 days
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信