预测零售银行客户流失的可解释机器学习

Sudi Murindanyi, Ben Wycliff Mugalu, J. Nakatumba-Nabende, Ggaliwango Marvin
{"title":"预测零售银行客户流失的可解释机器学习","authors":"Sudi Murindanyi, Ben Wycliff Mugalu, J. Nakatumba-Nabende, Ggaliwango Marvin","doi":"10.1109/ICOEI56765.2023.10125859","DOIUrl":null,"url":null,"abstract":"Customer churn is one of the biggest problems any brokerage institution has. This is evidenced by the rapid establishment of intelligent systems to predict customer churn, retain current clients, and win new ones in various domains. Unfortunately, there is an extreme scarcity of real-world datasets for training and establishing retail banking intelligent systems. Moreover, the Machine Learning (ML) models supporting such existing systems are all black box. The trends in electronics and informatics like Explainable Artificial Intelligence (XAI) have however provided a better approach to ML model accountability. This study leverages an Interpretable Machine Learning model to transparently predict the likelihood and cause of customer churn in retail banking. A real-world database (Berka) from a Czech bank was used for feature extraction using deep clustering. A dataset of features from the Berka database and a dataset from Kaggle were used to aid customer attrition prediction. Synthetic Minority Over Sampling Techniques (SMOTE) were then used to handle dataset imbalance before training, validating and testing with four tree-based and four standard machine learning approaches. The outstanding performance was achieved with random forest, a tree-based algorithm, on both datasets, with 99% accuracy, 98.5% recall, and 98.5% fl-score on the Berka dataset. It also scored 85 % accuracy, 77.5 % recall, and 77 % fl-score on the Kaggle dataset. Finally, Model-Agnostic Explanations (LIME) and SHapley Additive explanations (SHAP) are used for ML model accountability. This work can be reliably used to establish trustworthy intelligent systems in the financial sector and related domains.","PeriodicalId":168942,"journal":{"name":"2023 7th International Conference on Trends in Electronics and Informatics (ICOEI)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Interpretable Machine Learning for Predicting Customer Churn in Retail Banking\",\"authors\":\"Sudi Murindanyi, Ben Wycliff Mugalu, J. Nakatumba-Nabende, Ggaliwango Marvin\",\"doi\":\"10.1109/ICOEI56765.2023.10125859\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Customer churn is one of the biggest problems any brokerage institution has. This is evidenced by the rapid establishment of intelligent systems to predict customer churn, retain current clients, and win new ones in various domains. Unfortunately, there is an extreme scarcity of real-world datasets for training and establishing retail banking intelligent systems. Moreover, the Machine Learning (ML) models supporting such existing systems are all black box. The trends in electronics and informatics like Explainable Artificial Intelligence (XAI) have however provided a better approach to ML model accountability. This study leverages an Interpretable Machine Learning model to transparently predict the likelihood and cause of customer churn in retail banking. A real-world database (Berka) from a Czech bank was used for feature extraction using deep clustering. A dataset of features from the Berka database and a dataset from Kaggle were used to aid customer attrition prediction. Synthetic Minority Over Sampling Techniques (SMOTE) were then used to handle dataset imbalance before training, validating and testing with four tree-based and four standard machine learning approaches. The outstanding performance was achieved with random forest, a tree-based algorithm, on both datasets, with 99% accuracy, 98.5% recall, and 98.5% fl-score on the Berka dataset. It also scored 85 % accuracy, 77.5 % recall, and 77 % fl-score on the Kaggle dataset. Finally, Model-Agnostic Explanations (LIME) and SHapley Additive explanations (SHAP) are used for ML model accountability. This work can be reliably used to establish trustworthy intelligent systems in the financial sector and related domains.\",\"PeriodicalId\":168942,\"journal\":{\"name\":\"2023 7th International Conference on Trends in Electronics and Informatics (ICOEI)\",\"volume\":\"13 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-04-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 7th International Conference on Trends in Electronics and Informatics (ICOEI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICOEI56765.2023.10125859\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 7th International Conference on Trends in Electronics and Informatics (ICOEI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICOEI56765.2023.10125859","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

客户流失是任何经纪机构面临的最大问题之一。智能系统的快速建立证明了这一点,它可以预测客户流失,保留现有客户,并在各个领域赢得新客户。不幸的是,用于培训和建立零售银行智能系统的真实世界数据集极其稀缺。此外,支持这些现有系统的机器学习(ML)模型都是黑盒。然而,电子和信息学的趋势,如可解释人工智能(XAI),为ML模型问责制提供了更好的方法。本研究利用可解释的机器学习模型来透明地预测零售银行客户流失的可能性和原因。使用来自捷克银行的真实数据库(Berka)进行深度聚类特征提取。来自Berka数据库的特征数据集和来自Kaggle的数据集被用来帮助预测客户流失。然后使用合成少数派过采样技术(SMOTE)处理数据集不平衡,然后使用四种基于树的方法和四种标准机器学习方法进行训练,验证和测试。随机森林(一种基于树的算法)在两个数据集上都取得了出色的性能,在Berka数据集上准确率为99%,召回率为98.5%,fl-score为98.5%。在Kaggle数据集上,它的准确率为85%,召回率为77.5%,fl-score为77%。最后,模型不可知解释(LIME)和SHapley加性解释(SHAP)用于ML模型问责。这项工作可以可靠地用于在金融部门和相关领域建立可信的智能系统。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Interpretable Machine Learning for Predicting Customer Churn in Retail Banking
Customer churn is one of the biggest problems any brokerage institution has. This is evidenced by the rapid establishment of intelligent systems to predict customer churn, retain current clients, and win new ones in various domains. Unfortunately, there is an extreme scarcity of real-world datasets for training and establishing retail banking intelligent systems. Moreover, the Machine Learning (ML) models supporting such existing systems are all black box. The trends in electronics and informatics like Explainable Artificial Intelligence (XAI) have however provided a better approach to ML model accountability. This study leverages an Interpretable Machine Learning model to transparently predict the likelihood and cause of customer churn in retail banking. A real-world database (Berka) from a Czech bank was used for feature extraction using deep clustering. A dataset of features from the Berka database and a dataset from Kaggle were used to aid customer attrition prediction. Synthetic Minority Over Sampling Techniques (SMOTE) were then used to handle dataset imbalance before training, validating and testing with four tree-based and four standard machine learning approaches. The outstanding performance was achieved with random forest, a tree-based algorithm, on both datasets, with 99% accuracy, 98.5% recall, and 98.5% fl-score on the Berka dataset. It also scored 85 % accuracy, 77.5 % recall, and 77 % fl-score on the Kaggle dataset. Finally, Model-Agnostic Explanations (LIME) and SHapley Additive explanations (SHAP) are used for ML model accountability. This work can be reliably used to establish trustworthy intelligent systems in the financial sector and related domains.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信