Investigating the impact of undersampling and bagging: an empirical investigation for customer attrition modeling

IF 4.5 3区管理学 Q1 OPERATIONS RESEARCH & MANAGEMENT SCIENCE

Annals of Operations Research Pub Date : 2025-02-11 DOI:10.1007/s10479-025-06516-9

Arno De Caigny, Kristof Coussement, Matthijs Meire, Steven Hoornaert

{"title":"Investigating the impact of undersampling and bagging: an empirical investigation for customer attrition modeling","authors":"Arno De Caigny, Kristof Coussement, Matthijs Meire, Steven Hoornaert","doi":"10.1007/s10479-025-06516-9","DOIUrl":null,"url":null,"abstract":"<div><p>Given the growing interest in using AI and analytics to support CRM decision making, we discuss why undersampling and bagging are popular prediction techniques in customer churn prediction (CCP). The former helps in tackling the class imbalance problem and the latter improves model stability. However, extant CCP literature is unclear on the impact of undersampling on model stability and predictive performance, while bagging has difficulties in handling the class imbalance problem. Therefore, we extend existing CCP research to benchmark underbagging, which combines undersampling and bagging. Having both prediction techniques combined we recuperate customer data that would have been lost in undersampling by using them in multiple bags and passing an undersampled, more balanced training set to the classifier. In an extensive experiment including 11 real-life CCP datasets, underbagging is benchmarked against its constituents and other popular CCP classifiers in terms of predictive performance, profit and operational efficiency. Our results indicate that underbagging is a valid and reliable alternative framework for CCP prediction.</p></div>","PeriodicalId":8215,"journal":{"name":"Annals of Operations Research","volume":"346 3","pages":"2401 - 2421"},"PeriodicalIF":4.5000,"publicationDate":"2025-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annals of Operations Research","FirstCategoryId":"91","ListUrlMain":"https://link.springer.com/article/10.1007/s10479-025-06516-9","RegionNum":3,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"OPERATIONS RESEARCH & MANAGEMENT SCIENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Given the growing interest in using AI and analytics to support CRM decision making, we discuss why undersampling and bagging are popular prediction techniques in customer churn prediction (CCP). The former helps in tackling the class imbalance problem and the latter improves model stability. However, extant CCP literature is unclear on the impact of undersampling on model stability and predictive performance, while bagging has difficulties in handling the class imbalance problem. Therefore, we extend existing CCP research to benchmark underbagging, which combines undersampling and bagging. Having both prediction techniques combined we recuperate customer data that would have been lost in undersampling by using them in multiple bags and passing an undersampled, more balanced training set to the classifier. In an extensive experiment including 11 real-life CCP datasets, underbagging is benchmarked against its constituents and other popular CCP classifiers in terms of predictive performance, profit and operational efficiency. Our results indicate that underbagging is a valid and reliable alternative framework for CCP prediction.

Abstract Image

查看原文本刊更多论文

调查抽样不足和装袋的影响：客户流失模型的实证调查

鉴于人们对使用人工智能和分析来支持CRM决策的兴趣日益浓厚，我们讨论了为什么欠采样和装袋是客户流失预测（CCP）中流行的预测技术。前者有助于解决类不平衡问题，后者提高了模型的稳定性。然而，现有CCP文献对欠采样对模型稳定性和预测性能的影响尚不清楚，而套袋在处理类别不平衡问题方面存在困难。因此，我们将现有的CCP研究扩展到将欠采样和装袋相结合的基准欠装袋。将这两种预测技术结合起来，我们通过在多个袋子中使用客户数据，并将欠采样、更平衡的训练集传递给分类器，从而恢复可能在欠采样中丢失的客户数据。在包括11个真实CCP数据集的广泛实验中，underbagging在预测性能、利润和运营效率方面与其成分和其他流行的CCP分类器进行了基准测试。我们的研究结果表明，underbagging是一个有效和可靠的CCP预测框架。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Annals of Operations Research 管理科学-运筹学与管理科学

CiteScore

7.90

自引率

16.70%

发文量

596

审稿时长

8.4 months

期刊介绍： The Annals of Operations Research publishes peer-reviewed original articles dealing with key aspects of operations research, including theory, practice, and computation. The journal publishes full-length research articles, short notes, expositions and surveys, reports on computational studies, and case studies that present new and innovative practical applications. In addition to regular issues, the journal publishes periodic special volumes that focus on defined fields of operations research, ranging from the highly theoretical to the algorithmic and the applied. These volumes have one or more Guest Editors who are responsible for collecting the papers and overseeing the refereeing process.