Identifying Key Learning Algorithm Parameter of Forward Feature Selection to Integrate with Ensemble Learning for Customer Churn Prediction

Sabahat Tasneem, Muhammad Younas, Qasim Shafiq
{"title":"Identifying Key Learning Algorithm Parameter of Forward Feature Selection to Integrate with Ensemble Learning for Customer Churn Prediction","authors":"Sabahat Tasneem, Muhammad Younas, Qasim Shafiq","doi":"10.21015/vtse.v12i2.1811","DOIUrl":null,"url":null,"abstract":"The Telecommunication has been facing fierce growth of customer data and competition in the market for a couple of decades. Due to this situation, an analytical strategy of proactive anticipation about customer churn and their profitable retention is inevitable for Telecommunication companies. To nip this problem in the bud, a lot of research work has been conducted in the past, but still the previously introduced churn prediction models possess their own limitations, such as high dimensional data with poor information and class imbalance, which turn into barriers while being implicated in real life to attain accurate and improved predictions. This study has been conducted, basically, to identify the key Learning Algorithm parameter of Forward Feature Selection (FFS) for dimensionality reduction which can be further integrated with class Imbalance Handling Technique and Ensemble Learning (EL) to attain improved accuracy. The core objective of this study is to turn an imbalanced dataset into a balanced one for Ensemble Learning (EL) Model of Customer Churn Prediction (CCP). This study concluded that Logistic Regression (LR) based Forward Feature Selection (FFS) can outperform with Oversampling Class Imbalance Handling Techniques and Ensemble Learning (EL) by scoring 0.96% accuracy, which is the highest accuracy against benchmark studies. The resulting methodology has been named as the Logistic Regression Learning based Forward Feature Selection for ensemble Learning (LRLFFSEL) and applied over Orange dataset with 20 features and 3333 instances. In future this methodology can be evaluated over a bigger dataset and combined with some data optimization techniques to improve its accuracy.","PeriodicalId":173416,"journal":{"name":"VFAST Transactions on Software Engineering","volume":"14 5","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"VFAST Transactions on Software Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21015/vtse.v12i2.1811","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The Telecommunication has been facing fierce growth of customer data and competition in the market for a couple of decades. Due to this situation, an analytical strategy of proactive anticipation about customer churn and their profitable retention is inevitable for Telecommunication companies. To nip this problem in the bud, a lot of research work has been conducted in the past, but still the previously introduced churn prediction models possess their own limitations, such as high dimensional data with poor information and class imbalance, which turn into barriers while being implicated in real life to attain accurate and improved predictions. This study has been conducted, basically, to identify the key Learning Algorithm parameter of Forward Feature Selection (FFS) for dimensionality reduction which can be further integrated with class Imbalance Handling Technique and Ensemble Learning (EL) to attain improved accuracy. The core objective of this study is to turn an imbalanced dataset into a balanced one for Ensemble Learning (EL) Model of Customer Churn Prediction (CCP). This study concluded that Logistic Regression (LR) based Forward Feature Selection (FFS) can outperform with Oversampling Class Imbalance Handling Techniques and Ensemble Learning (EL) by scoring 0.96% accuracy, which is the highest accuracy against benchmark studies. The resulting methodology has been named as the Logistic Regression Learning based Forward Feature Selection for ensemble Learning (LRLFFSEL) and applied over Orange dataset with 20 features and 3333 instances. In future this methodology can be evaluated over a bigger dataset and combined with some data optimization techniques to improve its accuracy.
识别前向特征选择的关键学习算法参数,与客户流失预测的集合学习相结合
几十年来,电信业一直面临着客户数据和市场竞争的激烈增长。在这种情况下,电信公司不可避免地要采取分析策略,主动预测客户流失情况,并留住他们,使他们从中获利。为了将这一问题扼杀在萌芽状态,过去已经开展了大量研究工作,但之前推出的客户流失预测模型仍有其自身的局限性,如信息贫乏的高维数据和类不平衡,这些都成为在现实生活中实现准确和改进预测的障碍。本研究的主要目的是确定用于降低维度的前向特征选择(FFS)的关键学习算法参数,并将其与类不平衡处理技术和集合学习(EL)进一步整合,以提高准确性。本研究的核心目标是将不平衡数据集转化为客户流失预测(CCP)的集合学习(EL)模型的平衡数据集。本研究得出结论,基于逻辑回归(LR)的前向特征选择(FFS)可以超越过度采样类不平衡处理技术和集合学习(EL),获得 0.96% 的准确率,是基准研究中准确率最高的。由此产生的方法被命名为基于逻辑回归学习的前向特征选择集合学习(LRLFFSEL),并应用于具有 20 个特征和 3333 个实例的 Orange 数据集。今后,可以在更大的数据集上对该方法进行评估,并结合一些数据优化技术来提高其准确性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信