{"title":"增强的特征挖掘和分类器模型预测电子零售商的客户流失","authors":"K. B. Subramanya, Arun Kumar Somani","doi":"10.1201/B21822-13","DOIUrl":null,"url":null,"abstract":"Customer Churn, an event indicating a customer abandoning an established relation with a business is an important problem researched well, both for academic and commercial interest. Through this work, we propose an improved churn prediction model that emphasizes on an effective data collection pipeline through varied channels capturing explicit and implicit customer footprints. The goal of this paper is to demonstrate the improvement in classifier efficiency using an extended feature set and feature selection algorithms. Prominent features playing a vital role in customer churn are also ranked. The contributions through this paper can be broadly categorized into 3 folds: First, we discuss how popular data mining tools in Hadoop stack help extract several implicit customer interaction metrics including Sales and Clickstream logs generated as a result of customer interaction. Second, through Feature Engineering techniques we verify that some of the new features we propose have a definite impact on customer churn. Finally, we demonstrate how Regularized Logistic Regression, SVM and Gradient Boost Random Forests are the best performing models for predicting customer churn verified through comprehensive cross-validation techniques.","PeriodicalId":6651,"journal":{"name":"2017 7th International Conference on Cloud Computing, Data Science & Engineering - Confluence","volume":"107 1","pages":"531-536"},"PeriodicalIF":0.0000,"publicationDate":"2017-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"Enhanced feature mining and classifier models to predict customer churn for an E-retailer\",\"authors\":\"K. B. Subramanya, Arun Kumar Somani\",\"doi\":\"10.1201/B21822-13\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Customer Churn, an event indicating a customer abandoning an established relation with a business is an important problem researched well, both for academic and commercial interest. Through this work, we propose an improved churn prediction model that emphasizes on an effective data collection pipeline through varied channels capturing explicit and implicit customer footprints. The goal of this paper is to demonstrate the improvement in classifier efficiency using an extended feature set and feature selection algorithms. Prominent features playing a vital role in customer churn are also ranked. The contributions through this paper can be broadly categorized into 3 folds: First, we discuss how popular data mining tools in Hadoop stack help extract several implicit customer interaction metrics including Sales and Clickstream logs generated as a result of customer interaction. Second, through Feature Engineering techniques we verify that some of the new features we propose have a definite impact on customer churn. Finally, we demonstrate how Regularized Logistic Regression, SVM and Gradient Boost Random Forests are the best performing models for predicting customer churn verified through comprehensive cross-validation techniques.\",\"PeriodicalId\":6651,\"journal\":{\"name\":\"2017 7th International Conference on Cloud Computing, Data Science & Engineering - Confluence\",\"volume\":\"107 1\",\"pages\":\"531-536\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-10-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 7th International Conference on Cloud Computing, Data Science & Engineering - Confluence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1201/B21822-13\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 7th International Conference on Cloud Computing, Data Science & Engineering - Confluence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1201/B21822-13","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Enhanced feature mining and classifier models to predict customer churn for an E-retailer
Customer Churn, an event indicating a customer abandoning an established relation with a business is an important problem researched well, both for academic and commercial interest. Through this work, we propose an improved churn prediction model that emphasizes on an effective data collection pipeline through varied channels capturing explicit and implicit customer footprints. The goal of this paper is to demonstrate the improvement in classifier efficiency using an extended feature set and feature selection algorithms. Prominent features playing a vital role in customer churn are also ranked. The contributions through this paper can be broadly categorized into 3 folds: First, we discuss how popular data mining tools in Hadoop stack help extract several implicit customer interaction metrics including Sales and Clickstream logs generated as a result of customer interaction. Second, through Feature Engineering techniques we verify that some of the new features we propose have a definite impact on customer churn. Finally, we demonstrate how Regularized Logistic Regression, SVM and Gradient Boost Random Forests are the best performing models for predicting customer churn verified through comprehensive cross-validation techniques.