{"title":"使用基于随机和聚类的欠采样及 SVM 解决客户响应建模中的类别不平衡问题","authors":"Ljiljana Kašćelan, Sunčica Vuković","doi":"10.58245/ipsi.tir.2402.08","DOIUrl":null,"url":null,"abstract":"The main challenge in machine learning-based customer response models is the class imbalance problem, i.e. small number of respondents, compared to non-respondents. Aiming to overcome this issue, the approach of preprocessing training data using a Support Vector Machine (SVM), trained on a balanced sample obtained by random undersampling (B-SVM), as well as on a balanced sample obtained by clustering-based undersampling (CB-SVM) was tested. Several classifiers are tested on such a balanced dataset, to compare their predictive performances. The results of this paper demonstrate that the approach effectively preprocesses the training data, and, in turn, reduces noise and overcomes the class imbalance problem. Better predictive performance was achieved compared to standard training data balancing techniques such as undersampling and SMOTE. CB-SVM gives a better sensitivity, while B-SVM gives a better ratio of sensitivity and specificity. Organizations can utilize this approach to balance training data automatically and simply and more efficiently select customers that should be targeted in the next direct marketing campaigns.","PeriodicalId":516644,"journal":{"name":"IPSI Transactions on Internet Research","volume":"65 2","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Addressing Class Imbalance in Customer Response Modeling Using Random and Clustering-Based Undersampling and SVM\",\"authors\":\"Ljiljana Kašćelan, Sunčica Vuković\",\"doi\":\"10.58245/ipsi.tir.2402.08\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The main challenge in machine learning-based customer response models is the class imbalance problem, i.e. small number of respondents, compared to non-respondents. Aiming to overcome this issue, the approach of preprocessing training data using a Support Vector Machine (SVM), trained on a balanced sample obtained by random undersampling (B-SVM), as well as on a balanced sample obtained by clustering-based undersampling (CB-SVM) was tested. Several classifiers are tested on such a balanced dataset, to compare their predictive performances. The results of this paper demonstrate that the approach effectively preprocesses the training data, and, in turn, reduces noise and overcomes the class imbalance problem. Better predictive performance was achieved compared to standard training data balancing techniques such as undersampling and SMOTE. CB-SVM gives a better sensitivity, while B-SVM gives a better ratio of sensitivity and specificity. Organizations can utilize this approach to balance training data automatically and simply and more efficiently select customers that should be targeted in the next direct marketing campaigns.\",\"PeriodicalId\":516644,\"journal\":{\"name\":\"IPSI Transactions on Internet Research\",\"volume\":\"65 2\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IPSI Transactions on Internet Research\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.58245/ipsi.tir.2402.08\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IPSI Transactions on Internet Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.58245/ipsi.tir.2402.08","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Addressing Class Imbalance in Customer Response Modeling Using Random and Clustering-Based Undersampling and SVM
The main challenge in machine learning-based customer response models is the class imbalance problem, i.e. small number of respondents, compared to non-respondents. Aiming to overcome this issue, the approach of preprocessing training data using a Support Vector Machine (SVM), trained on a balanced sample obtained by random undersampling (B-SVM), as well as on a balanced sample obtained by clustering-based undersampling (CB-SVM) was tested. Several classifiers are tested on such a balanced dataset, to compare their predictive performances. The results of this paper demonstrate that the approach effectively preprocesses the training data, and, in turn, reduces noise and overcomes the class imbalance problem. Better predictive performance was achieved compared to standard training data balancing techniques such as undersampling and SMOTE. CB-SVM gives a better sensitivity, while B-SVM gives a better ratio of sensitivity and specificity. Organizations can utilize this approach to balance training data automatically and simply and more efficiently select customers that should be targeted in the next direct marketing campaigns.