基于子抽样的大规模客户流失分析的序贯一步估计方法

IF 1 4区数学 Q3 STATISTICS & PROBABILITY

Journal of the Royal Statistical Society Series C-Applied Statistics Pub Date : 2022-09-19 DOI:10.1111/rssc.12597

Feifei Wang, Danyang Huang, Tianchen Gao, Shuyuan Wu, Hansheng Wang

{"title":"基于子抽样的大规模客户流失分析的序贯一步估计方法","authors":"Feifei Wang, Danyang Huang, Tianchen Gao, Shuyuan Wu, Hansheng Wang","doi":"10.1111/rssc.12597","DOIUrl":null,"url":null,"abstract":"Customer churn is one of the most important concerns for large companies. Currently, massive data are often encountered in customer churn analysis, which bring new challenges for model computation. To cope with these concerns, sub-sampling methods are often used to accomplish data analysis tasks of large scale. To cover more informative samples in one sampling round, classic sub-sampling methods need to compute non-uniform sampling probabilities for all data points. However, this method creates a huge computational burden for data sets of large scale and therefore, is not applicable in practice. In this study, we propose a sequential one-step (SOS) estimation method based on repeated sub-sampling data sets. In the SOS method, data points need to be sampled only with uniform probabilities, and the sampling step is conducted repeatedly. In each sampling step, a new estimate is computed via one-step updating based on the newly sampled data points. This leads to a sequence of estimates, of which the final SOS estimate is their average. We theoretically show that both the bias and the standard error of the SOS estimator can decrease with increasing sub-sampling sizes or sub-sampling times. The finite sample SOS performances are assessed through simulations. Finally, we apply this SOS method to analyse a real large-scale customer churn data set in a securities company. The results show that the SOS method has good interpretability and prediction power in this real application.","PeriodicalId":49981,"journal":{"name":"Journal of the Royal Statistical Society Series C-Applied Statistics","volume":"71 5","pages":"1753-1786"},"PeriodicalIF":1.0000,"publicationDate":"2022-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Sequential one-step estimator by sub-sampling for customer churn analysis with massive data sets\",\"authors\":\"Feifei Wang, Danyang Huang, Tianchen Gao, Shuyuan Wu, Hansheng Wang\",\"doi\":\"10.1111/rssc.12597\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Customer churn is one of the most important concerns for large companies. Currently, massive data are often encountered in customer churn analysis, which bring new challenges for model computation. To cope with these concerns, sub-sampling methods are often used to accomplish data analysis tasks of large scale. To cover more informative samples in one sampling round, classic sub-sampling methods need to compute non-uniform sampling probabilities for all data points. However, this method creates a huge computational burden for data sets of large scale and therefore, is not applicable in practice. In this study, we propose a sequential one-step (SOS) estimation method based on repeated sub-sampling data sets. In the SOS method, data points need to be sampled only with uniform probabilities, and the sampling step is conducted repeatedly. In each sampling step, a new estimate is computed via one-step updating based on the newly sampled data points. This leads to a sequence of estimates, of which the final SOS estimate is their average. We theoretically show that both the bias and the standard error of the SOS estimator can decrease with increasing sub-sampling sizes or sub-sampling times. The finite sample SOS performances are assessed through simulations. Finally, we apply this SOS method to analyse a real large-scale customer churn data set in a securities company. The results show that the SOS method has good interpretability and prediction power in this real application.\",\"PeriodicalId\":49981,\"journal\":{\"name\":\"Journal of the Royal Statistical Society Series C-Applied Statistics\",\"volume\":\"71 5\",\"pages\":\"1753-1786\"},\"PeriodicalIF\":1.0000,\"publicationDate\":\"2022-09-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of the Royal Statistical Society Series C-Applied Statistics\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1111/rssc.12597\",\"RegionNum\":4,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"STATISTICS & PROBABILITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the Royal Statistical Society Series C-Applied Statistics","FirstCategoryId":"100","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/rssc.12597","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}

引用次数: 0

摘要

客户流失是大公司最关心的问题之一。目前，客户流失分析中经常会遇到海量数据，这给模型计算带来了新的挑战。为了解决这些问题，通常采用子抽样方法来完成大规模的数据分析任务。为了在一轮抽样中覆盖更多的信息样本，经典的子抽样方法需要计算所有数据点的非均匀抽样概率。但是，这种方法对于大规模的数据集产生了巨大的计算负担，因此在实际应用中并不适用。在本研究中，我们提出了一种基于重复子抽样数据集的顺序一步(SOS)估计方法。在SOS方法中，只需要对数据点进行均匀概率采样，并且重复进行采样步骤。在每个采样步骤中，通过基于新采样数据点的一步更新计算新的估计。这导致一系列估计，其中最终的SOS估计是它们的平均值。我们从理论上证明了SOS估计器的偏差和标准误差都可以随着子抽样大小或子抽样次数的增加而减小。通过仿真评估了有限样本SOS的性能。最后，我们将此方法应用于某证券公司实际大规模客户流失数据集的分析。结果表明，SOS方法在实际应用中具有良好的可解释性和预测能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Sequential one-step estimator by sub-sampling for customer churn analysis with massive data sets

Customer churn is one of the most important concerns for large companies. Currently, massive data are often encountered in customer churn analysis, which bring new challenges for model computation. To cope with these concerns, sub-sampling methods are often used to accomplish data analysis tasks of large scale. To cover more informative samples in one sampling round, classic sub-sampling methods need to compute non-uniform sampling probabilities for all data points. However, this method creates a huge computational burden for data sets of large scale and therefore, is not applicable in practice. In this study, we propose a sequential one-step (SOS) estimation method based on repeated sub-sampling data sets. In the SOS method, data points need to be sampled only with uniform probabilities, and the sampling step is conducted repeatedly. In each sampling step, a new estimate is computed via one-step updating based on the newly sampled data points. This leads to a sequence of estimates, of which the final SOS estimate is their average. We theoretically show that both the bias and the standard error of the SOS estimator can decrease with increasing sub-sampling sizes or sub-sampling times. The finite sample SOS performances are assessed through simulations. Finally, we apply this SOS method to analyse a real large-scale customer churn data set in a securities company. The results show that the SOS method has good interpretability and prediction power in this real application.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of the Royal Statistical Society Series C-Applied Statistics 数学-统计学与概率论

CiteScore

2.50

自引率

0.00%

发文量

审稿时长

>12 weeks

期刊介绍： The Journal of the Royal Statistical Society, Series C (Applied Statistics) is a journal of international repute for statisticians both inside and outside the academic world. The journal is concerned with papers which deal with novel solutions to real life statistical problems by adapting or developing methodology, or by demonstrating the proper application of new or existing statistical methods to them. At their heart therefore the papers in the journal are motivated by examples and statistical data of all kinds. The subject-matter covers the whole range of inter-disciplinary fields, e.g. applications in agriculture, genetics, industry, medicine and the physical sciences, and papers on design issues (e.g. in relation to experiments, surveys or observational studies). A deep understanding of statistical methodology is not necessary to appreciate the content. Although papers describing developments in statistical computing driven by practical examples are within its scope, the journal is not concerned with simply numerical illustrations or simulation studies. The emphasis of Series C is on case-studies of statistical analyses in practice.