Enhancing Customer Churn Prediction in the Banking Sector through Hybrid Segmented Models with Model-Agnostic Interpretability Techniques

IF 1.3 4区综合性期刊 Q3 MULTIDISCIPLINARY SCIENCES

National Academy Science Letters Pub Date : 2024-11-07 DOI:10.1007/s40009-024-01493-2

Astha Vashistha, Anoop Kumar Tiwari, Shubhdeep Singh Ghai, Paritosh Kumar Yadav, Sudhakar Pandey

{"title":"Enhancing Customer Churn Prediction in the Banking Sector through Hybrid Segmented Models with Model-Agnostic Interpretability Techniques","authors":"Astha Vashistha, Anoop Kumar Tiwari, Shubhdeep Singh Ghai, Paritosh Kumar Yadav, Sudhakar Pandey","doi":"10.1007/s40009-024-01493-2","DOIUrl":null,"url":null,"abstract":"<div><p>The banking industry is experiencing a transformative period due to rapid advancements in big data and artificial intelligence, which present both significant opportunities and challenges. One of the pressing challenges in the domain of customer churn prediction (CCP) is the accurate classification of imbalanced datasets. In this study, we conduct a comprehensive investigation into CCP within the banking sector, utilizing an extensive range of datasets. We integrate robust models capable of capturing complex non-linear relationships to develop hybrid segmented models for CCP. Additionally, we introduce a novel, model-agnostic technique that extends SHAP (SHapley Additive exPlanations) to ensure the interpretability of these segmented hybrid models. The approach rigorously evaluates the performance of various predictive models across 14 customer turnover datasets. The interpretability of the new model-agnostic method is showcased through a detailed case study, providing clear insights into model decision-making processes. The staged comparison trials reveal that the Voting Classifier, XGBoost, CatBoost, and LGBoost achieve accuracies of 0.81, 0.84, 0.82, and 0.83, respectively. Among these, XGBoost demonstrates the highest prediction performance, emerging as the recommended algorithm. This study not only advances the accuracy of CCP models in the banking sector but also enhances their interpretability, facilitating more informed decision-making.</p></div>","PeriodicalId":717,"journal":{"name":"National Academy Science Letters","volume":"48 4","pages":"459 - 463"},"PeriodicalIF":1.3000,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"National Academy Science Letters","FirstCategoryId":"4","ListUrlMain":"https://link.springer.com/article/10.1007/s40009-024-01493-2","RegionNum":4,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}

引用次数: 0

Abstract

The banking industry is experiencing a transformative period due to rapid advancements in big data and artificial intelligence, which present both significant opportunities and challenges. One of the pressing challenges in the domain of customer churn prediction (CCP) is the accurate classification of imbalanced datasets. In this study, we conduct a comprehensive investigation into CCP within the banking sector, utilizing an extensive range of datasets. We integrate robust models capable of capturing complex non-linear relationships to develop hybrid segmented models for CCP. Additionally, we introduce a novel, model-agnostic technique that extends SHAP (SHapley Additive exPlanations) to ensure the interpretability of these segmented hybrid models. The approach rigorously evaluates the performance of various predictive models across 14 customer turnover datasets. The interpretability of the new model-agnostic method is showcased through a detailed case study, providing clear insights into model decision-making processes. The staged comparison trials reveal that the Voting Classifier, XGBoost, CatBoost, and LGBoost achieve accuracies of 0.81, 0.84, 0.82, and 0.83, respectively. Among these, XGBoost demonstrates the highest prediction performance, emerging as the recommended algorithm. This study not only advances the accuracy of CCP models in the banking sector but also enhances their interpretability, facilitating more informed decision-making.

查看原文本刊更多论文

通过混合分割模型和模型不可知可解释性技术增强银行业客户流失预测

由于大数据和人工智能的快速发展，银行业正在经历一个变革时期，这既带来了重大机遇，也带来了重大挑战。客户流失预测（CCP）领域面临的紧迫挑战之一是对不平衡数据集进行准确分类。在本研究中，我们利用广泛的数据集，对银行业的CCP进行了全面调查。我们集成了能够捕获复杂非线性关系的鲁棒模型，以开发用于CCP的混合分段模型。此外，我们引入了一种新颖的模型不可知技术，该技术扩展了SHapley加性解释（SHapley Additive exPlanations），以确保这些分段混合模型的可解释性。该方法严格评估了跨14个客户周转数据集的各种预测模型的性能。新的模型不可知方法的可解释性是通过详细的案例研究来展示的，为模型决策过程提供了清晰的见解。分阶段的比较试验表明，投票分类器、XGBoost、CatBoost和LGBoost的准确率分别为0.81、0.84、0.82和0.83。其中，XGBoost的预测性能最高，成为推荐算法。本研究不仅提高了银行业CCP模型的准确性，而且提高了其可解释性，促进了更明智的决策。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

National Academy Science Letters 综合性期刊-综合性期刊

CiteScore

2.20

自引率

0.00%

发文量

审稿时长

12 months

期刊介绍： The National Academy Science Letters is published by the National Academy of Sciences, India, since 1978. The publication of this unique journal was started with a view to give quick and wide publicity to the innovations in all fields of science