{"title":"Analysis and Classification of Customer Churn Using Machine Learning Models","authors":"Muhammad Maulana Sidiq, Dyah Anggraini","doi":"10.29207/resti.v7i6.4933","DOIUrl":null,"url":null,"abstract":"Analysis studies of customer loss (customer churn) have been used for years to increase profitability and build customer relationships with companies. Customer analysis using exploratory data analysis (EDA) for visualizing data and the use of machine learning for the classification of customer churn are often used by past analysts. This study uses several machine learning models that can be used for customer churn classification, namely Logistic Regression, Random Forest, Support Vector Machine (SVM), Gradient Boosting, AdaBoost, and Extreme Gradient Boosting (XGBoost). However, there is a class imbalance factor in the dataset, which is the biggest challenge that is usually faced by analysts to get good results in the classification of machine learning models. The Synthetic Minority Over-sampling Technique (SMOTE) method is a popular method applied to deal with class imbalances in datasets. The results of the analysis show that the classification of churn customers using the XGBoost algorithm has the best level of accuracy compared to other algorithms, with an accuracy value of 0.829424, and the oversampling method with SMOTE tends to reduce the accuracy value of each classification algorithm. The Permutation Feature Importance (PFI) technique from the XGBoost model gets the result that tenure, monthly contracts, and TV streaming are the features that affect customer churn the most.","PeriodicalId":435683,"journal":{"name":"Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi)","volume":"66 3","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.29207/resti.v7i6.4933","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Analysis studies of customer loss (customer churn) have been used for years to increase profitability and build customer relationships with companies. Customer analysis using exploratory data analysis (EDA) for visualizing data and the use of machine learning for the classification of customer churn are often used by past analysts. This study uses several machine learning models that can be used for customer churn classification, namely Logistic Regression, Random Forest, Support Vector Machine (SVM), Gradient Boosting, AdaBoost, and Extreme Gradient Boosting (XGBoost). However, there is a class imbalance factor in the dataset, which is the biggest challenge that is usually faced by analysts to get good results in the classification of machine learning models. The Synthetic Minority Over-sampling Technique (SMOTE) method is a popular method applied to deal with class imbalances in datasets. The results of the analysis show that the classification of churn customers using the XGBoost algorithm has the best level of accuracy compared to other algorithms, with an accuracy value of 0.829424, and the oversampling method with SMOTE tends to reduce the accuracy value of each classification algorithm. The Permutation Feature Importance (PFI) technique from the XGBoost model gets the result that tenure, monthly contracts, and TV streaming are the features that affect customer churn the most.