Shimaa Ouf, Kholoud T. Mahmoud, Manal A. Abdel-Fattah
{"title":"A proposed hybrid framework to improve the accuracy of customer churn prediction in telecom industry","authors":"Shimaa Ouf, Kholoud T. Mahmoud, Manal A. Abdel-Fattah","doi":"10.1186/s40537-024-00922-9","DOIUrl":null,"url":null,"abstract":"<p>In the telecom sector, predicting customer churn has increased in importance in recent years. Developing a robust and accurate churn prediction model takes time, but it is crucial. Early churn prediction avoids revenue loss and improves customer retention. Telecom companies must identify these customers before they leave to solve this issue. Researchers have used a variety of applied machine-learning approaches to reveal the hidden relationships between different features. A key aspect of churn prediction is the accuracy level that affects the learning model's performance. This study aims to clarify several aspects of customer churn prediction accuracy and investigate state-of-the-art techniques' performance. However, no previous research has investigated performance using a hybrid framework combining the advantages of selecting suitable data preprocessing, ensemble learning, and resampling techniques. The study introduces a proposed hybrid framework that improves the accuracy of customer churn prediction in the telecom industry. The framework is built by integrating the XGBOOST classifier with the hybrid resampling method SMOTE-ENN, which concerns applying effective techniques for data preprocessing. The proposed framework is used for two experiments with three datasets in the telecom industry. This study determines which features are most crucial and influence customer churn, introduces the impact of data balancing, compares the classifiers' pre- and post-data balancing performances, and examines a speed-accuracy trade-off in hybrid classifiers. Many metrics, including accuracy, precision, recall, F1-score, and ROC curve, are used to analyze the results. All evaluation criteria are used to identify the most effective experiment. The results of the accuracy of the hybrid framework that respects balanced data outperformed applying the classifier only to imbalanced data. In addition, the results of the proposed hybrid framework are compared to previous studies on the same datasets, and the result of this comparison is offered. Compared with the review of the latest works, our proposed hybrid framework with the three datasets outperformed these works.</p>","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"58 1","pages":""},"PeriodicalIF":8.6000,"publicationDate":"2024-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Big Data","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1186/s40537-024-00922-9","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
In the telecom sector, predicting customer churn has increased in importance in recent years. Developing a robust and accurate churn prediction model takes time, but it is crucial. Early churn prediction avoids revenue loss and improves customer retention. Telecom companies must identify these customers before they leave to solve this issue. Researchers have used a variety of applied machine-learning approaches to reveal the hidden relationships between different features. A key aspect of churn prediction is the accuracy level that affects the learning model's performance. This study aims to clarify several aspects of customer churn prediction accuracy and investigate state-of-the-art techniques' performance. However, no previous research has investigated performance using a hybrid framework combining the advantages of selecting suitable data preprocessing, ensemble learning, and resampling techniques. The study introduces a proposed hybrid framework that improves the accuracy of customer churn prediction in the telecom industry. The framework is built by integrating the XGBOOST classifier with the hybrid resampling method SMOTE-ENN, which concerns applying effective techniques for data preprocessing. The proposed framework is used for two experiments with three datasets in the telecom industry. This study determines which features are most crucial and influence customer churn, introduces the impact of data balancing, compares the classifiers' pre- and post-data balancing performances, and examines a speed-accuracy trade-off in hybrid classifiers. Many metrics, including accuracy, precision, recall, F1-score, and ROC curve, are used to analyze the results. All evaluation criteria are used to identify the most effective experiment. The results of the accuracy of the hybrid framework that respects balanced data outperformed applying the classifier only to imbalanced data. In addition, the results of the proposed hybrid framework are compared to previous studies on the same datasets, and the result of this comparison is offered. Compared with the review of the latest works, our proposed hybrid framework with the three datasets outperformed these works.
期刊介绍:
The Journal of Big Data publishes high-quality, scholarly research papers, methodologies, and case studies covering a broad spectrum of topics, from big data analytics to data-intensive computing and all applications of big data research. It addresses challenges facing big data today and in the future, including data capture and storage, search, sharing, analytics, technologies, visualization, architectures, data mining, machine learning, cloud computing, distributed systems, and scalable storage. The journal serves as a seminal source of innovative material for academic researchers and practitioners alike.