Analysis and Classification of Customer Churn Using Machine Learning Models

Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) Pub Date : 2023-11-25 DOI:10.29207/resti.v7i6.4933

Muhammad Maulana Sidiq, Dyah Anggraini

{"title":"Analysis and Classification of Customer Churn Using Machine Learning Models","authors":"Muhammad Maulana Sidiq, Dyah Anggraini","doi":"10.29207/resti.v7i6.4933","DOIUrl":null,"url":null,"abstract":"Analysis studies of customer loss (customer churn) have been used for years to increase profitability and build customer relationships with companies. Customer analysis using exploratory data analysis (EDA) for visualizing data and the use of machine learning for the classification of customer churn are often used by past analysts. This study uses several machine learning models that can be used for customer churn classification, namely Logistic Regression, Random Forest, Support Vector Machine (SVM), Gradient Boosting, AdaBoost, and Extreme Gradient Boosting (XGBoost). However, there is a class imbalance factor in the dataset, which is the biggest challenge that is usually faced by analysts to get good results in the classification of machine learning models. The Synthetic Minority Over-sampling Technique (SMOTE) method is a popular method applied to deal with class imbalances in datasets. The results of the analysis show that the classification of churn customers using the XGBoost algorithm has the best level of accuracy compared to other algorithms, with an accuracy value of 0.829424, and the oversampling method with SMOTE tends to reduce the accuracy value of each classification algorithm. The Permutation Feature Importance (PFI) technique from the XGBoost model gets the result that tenure, monthly contracts, and TV streaming are the features that affect customer churn the most.","PeriodicalId":435683,"journal":{"name":"Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi)","volume":"66 3","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.29207/resti.v7i6.4933","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Analysis studies of customer loss (customer churn) have been used for years to increase profitability and build customer relationships with companies. Customer analysis using exploratory data analysis (EDA) for visualizing data and the use of machine learning for the classification of customer churn are often used by past analysts. This study uses several machine learning models that can be used for customer churn classification, namely Logistic Regression, Random Forest, Support Vector Machine (SVM), Gradient Boosting, AdaBoost, and Extreme Gradient Boosting (XGBoost). However, there is a class imbalance factor in the dataset, which is the biggest challenge that is usually faced by analysts to get good results in the classification of machine learning models. The Synthetic Minority Over-sampling Technique (SMOTE) method is a popular method applied to deal with class imbalances in datasets. The results of the analysis show that the classification of churn customers using the XGBoost algorithm has the best level of accuracy compared to other algorithms, with an accuracy value of 0.829424, and the oversampling method with SMOTE tends to reduce the accuracy value of each classification algorithm. The Permutation Feature Importance (PFI) technique from the XGBoost model gets the result that tenure, monthly contracts, and TV streaming are the features that affect customer churn the most.

查看原文本刊更多论文

使用机器学习模型对客户流失进行分析和分类

多年来，对客户流失（客户流失）的分析研究一直被用于提高公司的盈利能力和建立客户关系。过去的分析师经常使用探索性数据分析（EDA）进行客户分析，以实现数据的可视化，并使用机器学习对客户流失进行分类。本研究使用了几种可用于客户流失分类的机器学习模型，即逻辑回归（Logistic Regression）、随机森林（Random Forest）、支持向量机（SVM）、梯度提升（Gradient Boosting）、AdaBoost 和极端梯度提升（XGBoost）。然而，数据集中存在类不平衡因素，这是分析人员在机器学习模型分类中获得良好结果通常面临的最大挑战。合成少数群体过度采样技术（SMOTE）方法是一种常用的处理数据集类不平衡的方法。分析结果表明，与其他算法相比，使用 XGBoost 算法对流失客户进行分类的准确度水平最好，准确度值为 0.829424，而使用 SMOTE 的超采样方法往往会降低各分类算法的准确度值。XGBoost 模型中的排列特征重要性（PFI）技术得出的结果是，保有权、月度合同和电视流媒体是对客户流失影响最大的特征。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi)

自引率

0.00%

发文量