{"title":"Churn Prediction in Telecommunications Industry Based on Conditional Wasserstein GAN","authors":"Chang Su, Linglin Wei, Xianzhong Xie","doi":"10.1109/HiPC56025.2022.00034","DOIUrl":null,"url":null,"abstract":"In recent years, with the globalization and advancement of the telecommunications industry, the competition in the telecommunications market has become more intense, accompanied by high customer churn rates. Therefore, telecom operators urgently need to formulate effective marketing strategies to prevent the churning of customers. Customer churn prediction is an important means to prevent customer churn, but due to the imbalance of data in the telecommunications industry, the prediction results are always unsatisfactory. To improve prediction performance, the most common method is to oversample the minority class. Standard methods such as SMOTE usually only focus on the minority class samples, and it is easy to ignore the connection between the minority class samples and the majority class samples. In addition, in the case of high-dimensional, complex data distribution, the Euclidean distance used in the SMOTE algorithm is not particularly meaningful and tend to underperform. While Generative Adversarial Networks (GANs) are able to model complex distributions and can in principle be used to generate minority class cases. Therefore, this paper adopts a comprehensive GAN model (CWGAN) based on Wasserstein GAN with Gradient Penalty (WGANGP) and Conditional GAN (CGAN) to handle the imbalanced data in the telecom industry. This is also the first time that GAN has been used to deal with the data imbalance problem in the telecom industry. At the same time, this paper also introduces a hybrid attention mechanism (CBAM) to further assist the generator to focus on features related to classification tasks. Afterwards, the effectiveness of the adopted method is demonstrated on four commonly used machine learning classifiers.","PeriodicalId":119363,"journal":{"name":"2022 IEEE 29th International Conference on High Performance Computing, Data, and Analytics (HiPC)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 29th International Conference on High Performance Computing, Data, and Analytics (HiPC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HiPC56025.2022.00034","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In recent years, with the globalization and advancement of the telecommunications industry, the competition in the telecommunications market has become more intense, accompanied by high customer churn rates. Therefore, telecom operators urgently need to formulate effective marketing strategies to prevent the churning of customers. Customer churn prediction is an important means to prevent customer churn, but due to the imbalance of data in the telecommunications industry, the prediction results are always unsatisfactory. To improve prediction performance, the most common method is to oversample the minority class. Standard methods such as SMOTE usually only focus on the minority class samples, and it is easy to ignore the connection between the minority class samples and the majority class samples. In addition, in the case of high-dimensional, complex data distribution, the Euclidean distance used in the SMOTE algorithm is not particularly meaningful and tend to underperform. While Generative Adversarial Networks (GANs) are able to model complex distributions and can in principle be used to generate minority class cases. Therefore, this paper adopts a comprehensive GAN model (CWGAN) based on Wasserstein GAN with Gradient Penalty (WGANGP) and Conditional GAN (CGAN) to handle the imbalanced data in the telecom industry. This is also the first time that GAN has been used to deal with the data imbalance problem in the telecom industry. At the same time, this paper also introduces a hybrid attention mechanism (CBAM) to further assist the generator to focus on features related to classification tasks. Afterwards, the effectiveness of the adopted method is demonstrated on four commonly used machine learning classifiers.