{"title":"Enhanced feature mining and classifier models to predict customer churn for an E-retailer","authors":"K. B. Subramanya, Arun Kumar Somani","doi":"10.1201/B21822-13","DOIUrl":null,"url":null,"abstract":"Customer Churn, an event indicating a customer abandoning an established relation with a business is an important problem researched well, both for academic and commercial interest. Through this work, we propose an improved churn prediction model that emphasizes on an effective data collection pipeline through varied channels capturing explicit and implicit customer footprints. The goal of this paper is to demonstrate the improvement in classifier efficiency using an extended feature set and feature selection algorithms. Prominent features playing a vital role in customer churn are also ranked. The contributions through this paper can be broadly categorized into 3 folds: First, we discuss how popular data mining tools in Hadoop stack help extract several implicit customer interaction metrics including Sales and Clickstream logs generated as a result of customer interaction. Second, through Feature Engineering techniques we verify that some of the new features we propose have a definite impact on customer churn. Finally, we demonstrate how Regularized Logistic Regression, SVM and Gradient Boost Random Forests are the best performing models for predicting customer churn verified through comprehensive cross-validation techniques.","PeriodicalId":6651,"journal":{"name":"2017 7th International Conference on Cloud Computing, Data Science & Engineering - Confluence","volume":"107 1","pages":"531-536"},"PeriodicalIF":0.0000,"publicationDate":"2017-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 7th International Conference on Cloud Computing, Data Science & Engineering - Confluence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1201/B21822-13","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 12
Abstract
Customer Churn, an event indicating a customer abandoning an established relation with a business is an important problem researched well, both for academic and commercial interest. Through this work, we propose an improved churn prediction model that emphasizes on an effective data collection pipeline through varied channels capturing explicit and implicit customer footprints. The goal of this paper is to demonstrate the improvement in classifier efficiency using an extended feature set and feature selection algorithms. Prominent features playing a vital role in customer churn are also ranked. The contributions through this paper can be broadly categorized into 3 folds: First, we discuss how popular data mining tools in Hadoop stack help extract several implicit customer interaction metrics including Sales and Clickstream logs generated as a result of customer interaction. Second, through Feature Engineering techniques we verify that some of the new features we propose have a definite impact on customer churn. Finally, we demonstrate how Regularized Logistic Regression, SVM and Gradient Boost Random Forests are the best performing models for predicting customer churn verified through comprehensive cross-validation techniques.