Sudi Murindanyi, Ben Wycliff Mugalu, J. Nakatumba-Nabende, Ggaliwango Marvin
{"title":"预测零售银行客户流失的可解释机器学习","authors":"Sudi Murindanyi, Ben Wycliff Mugalu, J. Nakatumba-Nabende, Ggaliwango Marvin","doi":"10.1109/ICOEI56765.2023.10125859","DOIUrl":null,"url":null,"abstract":"Customer churn is one of the biggest problems any brokerage institution has. This is evidenced by the rapid establishment of intelligent systems to predict customer churn, retain current clients, and win new ones in various domains. Unfortunately, there is an extreme scarcity of real-world datasets for training and establishing retail banking intelligent systems. Moreover, the Machine Learning (ML) models supporting such existing systems are all black box. The trends in electronics and informatics like Explainable Artificial Intelligence (XAI) have however provided a better approach to ML model accountability. This study leverages an Interpretable Machine Learning model to transparently predict the likelihood and cause of customer churn in retail banking. A real-world database (Berka) from a Czech bank was used for feature extraction using deep clustering. A dataset of features from the Berka database and a dataset from Kaggle were used to aid customer attrition prediction. Synthetic Minority Over Sampling Techniques (SMOTE) were then used to handle dataset imbalance before training, validating and testing with four tree-based and four standard machine learning approaches. The outstanding performance was achieved with random forest, a tree-based algorithm, on both datasets, with 99% accuracy, 98.5% recall, and 98.5% fl-score on the Berka dataset. It also scored 85 % accuracy, 77.5 % recall, and 77 % fl-score on the Kaggle dataset. Finally, Model-Agnostic Explanations (LIME) and SHapley Additive explanations (SHAP) are used for ML model accountability. This work can be reliably used to establish trustworthy intelligent systems in the financial sector and related domains.","PeriodicalId":168942,"journal":{"name":"2023 7th International Conference on Trends in Electronics and Informatics (ICOEI)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Interpretable Machine Learning for Predicting Customer Churn in Retail Banking\",\"authors\":\"Sudi Murindanyi, Ben Wycliff Mugalu, J. Nakatumba-Nabende, Ggaliwango Marvin\",\"doi\":\"10.1109/ICOEI56765.2023.10125859\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Customer churn is one of the biggest problems any brokerage institution has. This is evidenced by the rapid establishment of intelligent systems to predict customer churn, retain current clients, and win new ones in various domains. Unfortunately, there is an extreme scarcity of real-world datasets for training and establishing retail banking intelligent systems. Moreover, the Machine Learning (ML) models supporting such existing systems are all black box. The trends in electronics and informatics like Explainable Artificial Intelligence (XAI) have however provided a better approach to ML model accountability. This study leverages an Interpretable Machine Learning model to transparently predict the likelihood and cause of customer churn in retail banking. A real-world database (Berka) from a Czech bank was used for feature extraction using deep clustering. A dataset of features from the Berka database and a dataset from Kaggle were used to aid customer attrition prediction. Synthetic Minority Over Sampling Techniques (SMOTE) were then used to handle dataset imbalance before training, validating and testing with four tree-based and four standard machine learning approaches. The outstanding performance was achieved with random forest, a tree-based algorithm, on both datasets, with 99% accuracy, 98.5% recall, and 98.5% fl-score on the Berka dataset. It also scored 85 % accuracy, 77.5 % recall, and 77 % fl-score on the Kaggle dataset. Finally, Model-Agnostic Explanations (LIME) and SHapley Additive explanations (SHAP) are used for ML model accountability. This work can be reliably used to establish trustworthy intelligent systems in the financial sector and related domains.\",\"PeriodicalId\":168942,\"journal\":{\"name\":\"2023 7th International Conference on Trends in Electronics and Informatics (ICOEI)\",\"volume\":\"13 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-04-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 7th International Conference on Trends in Electronics and Informatics (ICOEI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICOEI56765.2023.10125859\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 7th International Conference on Trends in Electronics and Informatics (ICOEI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICOEI56765.2023.10125859","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Interpretable Machine Learning for Predicting Customer Churn in Retail Banking
Customer churn is one of the biggest problems any brokerage institution has. This is evidenced by the rapid establishment of intelligent systems to predict customer churn, retain current clients, and win new ones in various domains. Unfortunately, there is an extreme scarcity of real-world datasets for training and establishing retail banking intelligent systems. Moreover, the Machine Learning (ML) models supporting such existing systems are all black box. The trends in electronics and informatics like Explainable Artificial Intelligence (XAI) have however provided a better approach to ML model accountability. This study leverages an Interpretable Machine Learning model to transparently predict the likelihood and cause of customer churn in retail banking. A real-world database (Berka) from a Czech bank was used for feature extraction using deep clustering. A dataset of features from the Berka database and a dataset from Kaggle were used to aid customer attrition prediction. Synthetic Minority Over Sampling Techniques (SMOTE) were then used to handle dataset imbalance before training, validating and testing with four tree-based and four standard machine learning approaches. The outstanding performance was achieved with random forest, a tree-based algorithm, on both datasets, with 99% accuracy, 98.5% recall, and 98.5% fl-score on the Berka dataset. It also scored 85 % accuracy, 77.5 % recall, and 77 % fl-score on the Kaggle dataset. Finally, Model-Agnostic Explanations (LIME) and SHapley Additive explanations (SHAP) are used for ML model accountability. This work can be reliably used to establish trustworthy intelligent systems in the financial sector and related domains.