使用机器学习模型对客户流失进行分析和分类

Muhammad Maulana Sidiq, Dyah Anggraini
{"title":"使用机器学习模型对客户流失进行分析和分类","authors":"Muhammad Maulana Sidiq, Dyah Anggraini","doi":"10.29207/resti.v7i6.4933","DOIUrl":null,"url":null,"abstract":"Analysis studies of customer loss (customer churn) have been used for years to increase profitability and build customer relationships with companies. Customer analysis using exploratory data analysis (EDA) for visualizing data and the use of machine learning for the classification of customer churn are often used by past analysts. This study uses several machine learning models that can be used for customer churn classification, namely Logistic Regression, Random Forest, Support Vector Machine (SVM), Gradient Boosting, AdaBoost, and Extreme Gradient Boosting (XGBoost). However, there is a class imbalance factor in the dataset, which is the biggest challenge that is usually faced by analysts to get good results in the classification of machine learning models. The Synthetic Minority Over-sampling Technique (SMOTE) method is a popular method applied to deal with class imbalances in datasets. The results of the analysis show that the classification of churn customers using the XGBoost algorithm has the best level of accuracy compared to other algorithms, with an accuracy value of 0.829424, and the oversampling method with SMOTE tends to reduce the accuracy value of each classification algorithm. The Permutation Feature Importance (PFI) technique from the XGBoost model gets the result that tenure, monthly contracts, and TV streaming are the features that affect customer churn the most.","PeriodicalId":435683,"journal":{"name":"Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi)","volume":"66 3","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Analysis and Classification of Customer Churn Using Machine Learning Models\",\"authors\":\"Muhammad Maulana Sidiq, Dyah Anggraini\",\"doi\":\"10.29207/resti.v7i6.4933\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Analysis studies of customer loss (customer churn) have been used for years to increase profitability and build customer relationships with companies. Customer analysis using exploratory data analysis (EDA) for visualizing data and the use of machine learning for the classification of customer churn are often used by past analysts. This study uses several machine learning models that can be used for customer churn classification, namely Logistic Regression, Random Forest, Support Vector Machine (SVM), Gradient Boosting, AdaBoost, and Extreme Gradient Boosting (XGBoost). However, there is a class imbalance factor in the dataset, which is the biggest challenge that is usually faced by analysts to get good results in the classification of machine learning models. The Synthetic Minority Over-sampling Technique (SMOTE) method is a popular method applied to deal with class imbalances in datasets. The results of the analysis show that the classification of churn customers using the XGBoost algorithm has the best level of accuracy compared to other algorithms, with an accuracy value of 0.829424, and the oversampling method with SMOTE tends to reduce the accuracy value of each classification algorithm. The Permutation Feature Importance (PFI) technique from the XGBoost model gets the result that tenure, monthly contracts, and TV streaming are the features that affect customer churn the most.\",\"PeriodicalId\":435683,\"journal\":{\"name\":\"Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi)\",\"volume\":\"66 3\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-11-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.29207/resti.v7i6.4933\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.29207/resti.v7i6.4933","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

多年来,对客户流失(客户流失)的分析研究一直被用于提高公司的盈利能力和建立客户关系。过去的分析师经常使用探索性数据分析(EDA)进行客户分析,以实现数据的可视化,并使用机器学习对客户流失进行分类。本研究使用了几种可用于客户流失分类的机器学习模型,即逻辑回归(Logistic Regression)、随机森林(Random Forest)、支持向量机(SVM)、梯度提升(Gradient Boosting)、AdaBoost 和极端梯度提升(XGBoost)。然而,数据集中存在类不平衡因素,这是分析人员在机器学习模型分类中获得良好结果通常面临的最大挑战。合成少数群体过度采样技术(SMOTE)方法是一种常用的处理数据集类不平衡的方法。分析结果表明,与其他算法相比,使用 XGBoost 算法对流失客户进行分类的准确度水平最好,准确度值为 0.829424,而使用 SMOTE 的超采样方法往往会降低各分类算法的准确度值。XGBoost 模型中的排列特征重要性(PFI)技术得出的结果是,保有权、月度合同和电视流媒体是对客户流失影响最大的特征。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Analysis and Classification of Customer Churn Using Machine Learning Models
Analysis studies of customer loss (customer churn) have been used for years to increase profitability and build customer relationships with companies. Customer analysis using exploratory data analysis (EDA) for visualizing data and the use of machine learning for the classification of customer churn are often used by past analysts. This study uses several machine learning models that can be used for customer churn classification, namely Logistic Regression, Random Forest, Support Vector Machine (SVM), Gradient Boosting, AdaBoost, and Extreme Gradient Boosting (XGBoost). However, there is a class imbalance factor in the dataset, which is the biggest challenge that is usually faced by analysts to get good results in the classification of machine learning models. The Synthetic Minority Over-sampling Technique (SMOTE) method is a popular method applied to deal with class imbalances in datasets. The results of the analysis show that the classification of churn customers using the XGBoost algorithm has the best level of accuracy compared to other algorithms, with an accuracy value of 0.829424, and the oversampling method with SMOTE tends to reduce the accuracy value of each classification algorithm. The Permutation Feature Importance (PFI) technique from the XGBoost model gets the result that tenure, monthly contracts, and TV streaming are the features that affect customer churn the most.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信