客户流失预测的堆叠集成方法:将CNN和机器学习模型与CatBoost元学习器集成

Yan Lin Tan, Ying Han Pang, Shih Yin Ooi, Wee How Khoh, Fu San Hiew
{"title":"客户流失预测的堆叠集成方法:将CNN和机器学习模型与CatBoost元学习器集成","authors":"Yan Lin Tan, Ying Han Pang, Shih Yin Ooi, Wee How Khoh, Fu San Hiew","doi":"10.33093/jetap.2023.5.2.12","DOIUrl":null,"url":null,"abstract":"n the telecom industry, predicting customer churn is crucial for improving customer retention. In literature, the use of single classifiers is predominantly focused. Customer data is complex data due to class imbalance and contain multiple factors that exhibit nonlinear dependencies. In these complex scenarios, single classifiers may be unable to fully utilize the available information to capture the underlying interactions effectively. In contrast, ensemble learning that combines various base classifiers empowers a more thorough data analysis, leading to improved prediction performance. In this paper, a heterogeneous ensemble model is proposed for churn prediction in the telecom industry. The model involves exploratory data analysis, data pre-processing and data resampling to handle class imbalance. In this proposed model, multiple trained base classifiers with different characteristics are integrated through a stacking ensemble technique. Specifically, convolutional-based neural network, logistic regression, decision tree and Support Vector Machine (SVM) are considered as the base classifiers in this work. The proposed stacking ensemble model utilizes the unique strengths of each base classifier and leverages collective knowledge to improve prediction performance with a meta-learner. The efficacy of the proposed model is assessed on a real-world dataset, i.e., Cell2Cell. The empirical results demonstrate the superiority of the proposed model in churn prediction with 62.4 % f1-score and 60.62 % recall.","PeriodicalId":441201,"journal":{"name":"Journal of Engineering Technology and Applied Physics","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Stacking Ensemble Approach for Churn Prediction: Integrating CNN and Machine Learning Models with CatBoost Meta-Learner\",\"authors\":\"Yan Lin Tan, Ying Han Pang, Shih Yin Ooi, Wee How Khoh, Fu San Hiew\",\"doi\":\"10.33093/jetap.2023.5.2.12\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"n the telecom industry, predicting customer churn is crucial for improving customer retention. In literature, the use of single classifiers is predominantly focused. Customer data is complex data due to class imbalance and contain multiple factors that exhibit nonlinear dependencies. In these complex scenarios, single classifiers may be unable to fully utilize the available information to capture the underlying interactions effectively. In contrast, ensemble learning that combines various base classifiers empowers a more thorough data analysis, leading to improved prediction performance. In this paper, a heterogeneous ensemble model is proposed for churn prediction in the telecom industry. The model involves exploratory data analysis, data pre-processing and data resampling to handle class imbalance. In this proposed model, multiple trained base classifiers with different characteristics are integrated through a stacking ensemble technique. Specifically, convolutional-based neural network, logistic regression, decision tree and Support Vector Machine (SVM) are considered as the base classifiers in this work. The proposed stacking ensemble model utilizes the unique strengths of each base classifier and leverages collective knowledge to improve prediction performance with a meta-learner. The efficacy of the proposed model is assessed on a real-world dataset, i.e., Cell2Cell. The empirical results demonstrate the superiority of the proposed model in churn prediction with 62.4 % f1-score and 60.62 % recall.\",\"PeriodicalId\":441201,\"journal\":{\"name\":\"Journal of Engineering Technology and Applied Physics\",\"volume\":\"26 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-09-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Engineering Technology and Applied Physics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.33093/jetap.2023.5.2.12\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Engineering Technology and Applied Physics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.33093/jetap.2023.5.2.12","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

在电信行业,预测客户流失对于提高客户保留率至关重要。在文献中,单一分类词的使用是主要关注的。由于类的不平衡,客户数据是复杂的数据,并且包含多个表现出非线性依赖关系的因素。在这些复杂的场景中,单个分类器可能无法充分利用可用信息来有效地捕获底层交互。相比之下,集成学习结合了各种基本分类器,可以进行更彻底的数据分析,从而提高预测性能。本文提出了一种用于电信行业客户流失预测的异构集成模型。该模型通过探索性数据分析、数据预处理和数据重采样来处理类不平衡问题。在该模型中,通过叠加集成技术将具有不同特征的多个训练基分类器集成在一起。具体来说,基于卷积的神经网络、逻辑回归、决策树和支持向量机(SVM)作为本工作的基本分类器。提出的叠加集成模型利用每个基分类器的独特优势,并利用集体知识提高元学习器的预测性能。所提出的模型的有效性在真实世界的数据集上进行了评估,即Cell2Cell。实证结果表明,该模型在客户流失预测方面具有62.4%的f1得分和60.62%的召回率的优势。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Stacking Ensemble Approach for Churn Prediction: Integrating CNN and Machine Learning Models with CatBoost Meta-Learner
n the telecom industry, predicting customer churn is crucial for improving customer retention. In literature, the use of single classifiers is predominantly focused. Customer data is complex data due to class imbalance and contain multiple factors that exhibit nonlinear dependencies. In these complex scenarios, single classifiers may be unable to fully utilize the available information to capture the underlying interactions effectively. In contrast, ensemble learning that combines various base classifiers empowers a more thorough data analysis, leading to improved prediction performance. In this paper, a heterogeneous ensemble model is proposed for churn prediction in the telecom industry. The model involves exploratory data analysis, data pre-processing and data resampling to handle class imbalance. In this proposed model, multiple trained base classifiers with different characteristics are integrated through a stacking ensemble technique. Specifically, convolutional-based neural network, logistic regression, decision tree and Support Vector Machine (SVM) are considered as the base classifiers in this work. The proposed stacking ensemble model utilizes the unique strengths of each base classifier and leverages collective knowledge to improve prediction performance with a meta-learner. The efficacy of the proposed model is assessed on a real-world dataset, i.e., Cell2Cell. The empirical results demonstrate the superiority of the proposed model in churn prediction with 62.4 % f1-score and 60.62 % recall.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信