使用基于随机和聚类的欠采样及 SVM 解决客户响应建模中的类别不平衡问题

IPSI Transactions on Internet Research Pub Date : 2024-07-01 DOI:10.58245/ipsi.tir.2402.08

Ljiljana Kašćelan, Sunčica Vuković

{"title":"使用基于随机和聚类的欠采样及 SVM 解决客户响应建模中的类别不平衡问题","authors":"Ljiljana Kašćelan, Sunčica Vuković","doi":"10.58245/ipsi.tir.2402.08","DOIUrl":null,"url":null,"abstract":"The main challenge in machine learning-based customer response models is the class imbalance problem, i.e. small number of respondents, compared to non-respondents. Aiming to overcome this issue, the approach of preprocessing training data using a Support Vector Machine (SVM), trained on a balanced sample obtained by random undersampling (B-SVM), as well as on a balanced sample obtained by clustering-based undersampling (CB-SVM) was tested. Several classifiers are tested on such a balanced dataset, to compare their predictive performances. The results of this paper demonstrate that the approach effectively preprocesses the training data, and, in turn, reduces noise and overcomes the class imbalance problem. Better predictive performance was achieved compared to standard training data balancing techniques such as undersampling and SMOTE. CB-SVM gives a better sensitivity, while B-SVM gives a better ratio of sensitivity and specificity. Organizations can utilize this approach to balance training data automatically and simply and more efficiently select customers that should be targeted in the next direct marketing campaigns.","PeriodicalId":516644,"journal":{"name":"IPSI Transactions on Internet Research","volume":"65 2","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Addressing Class Imbalance in Customer Response Modeling Using Random and Clustering-Based Undersampling and SVM\",\"authors\":\"Ljiljana Kašćelan, Sunčica Vuković\",\"doi\":\"10.58245/ipsi.tir.2402.08\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The main challenge in machine learning-based customer response models is the class imbalance problem, i.e. small number of respondents, compared to non-respondents. Aiming to overcome this issue, the approach of preprocessing training data using a Support Vector Machine (SVM), trained on a balanced sample obtained by random undersampling (B-SVM), as well as on a balanced sample obtained by clustering-based undersampling (CB-SVM) was tested. Several classifiers are tested on such a balanced dataset, to compare their predictive performances. The results of this paper demonstrate that the approach effectively preprocesses the training data, and, in turn, reduces noise and overcomes the class imbalance problem. Better predictive performance was achieved compared to standard training data balancing techniques such as undersampling and SMOTE. CB-SVM gives a better sensitivity, while B-SVM gives a better ratio of sensitivity and specificity. Organizations can utilize this approach to balance training data automatically and simply and more efficiently select customers that should be targeted in the next direct marketing campaigns.\",\"PeriodicalId\":516644,\"journal\":{\"name\":\"IPSI Transactions on Internet Research\",\"volume\":\"65 2\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IPSI Transactions on Internet Research\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.58245/ipsi.tir.2402.08\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IPSI Transactions on Internet Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.58245/ipsi.tir.2402.08","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

基于机器学习的客户响应模型面临的主要挑战是类不平衡问题，即与非响应者相比，响应者数量较少。为了克服这一问题，我们测试了使用支持向量机（SVM）对训练数据进行预处理的方法，该方法是在通过随机欠采样（B-SVM）获得的平衡样本以及通过基于聚类的欠采样（CB-SVM）获得的平衡样本上进行训练的。在这样一个平衡数据集上测试了几种分类器，以比较它们的预测性能。本文的结果表明，该方法有效地预处理了训练数据，进而减少了噪声，克服了类不平衡问题。与标准的训练数据平衡技术（如欠采样和 SMOTE）相比，CB-SVM 取得了更好的预测性能。CB-SVM 的灵敏度更高，而 B-SVM 的灵敏度和特异性比率更高。企业可以利用这种方法自动平衡训练数据，更简单、更高效地选择下一次直销活动的目标客户。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Addressing Class Imbalance in Customer Response Modeling Using Random and Clustering-Based Undersampling and SVM

The main challenge in machine learning-based customer response models is the class imbalance problem, i.e. small number of respondents, compared to non-respondents. Aiming to overcome this issue, the approach of preprocessing training data using a Support Vector Machine (SVM), trained on a balanced sample obtained by random undersampling (B-SVM), as well as on a balanced sample obtained by clustering-based undersampling (CB-SVM) was tested. Several classifiers are tested on such a balanced dataset, to compare their predictive performances. The results of this paper demonstrate that the approach effectively preprocesses the training data, and, in turn, reduces noise and overcomes the class imbalance problem. Better predictive performance was achieved compared to standard training data balancing techniques such as undersampling and SMOTE. CB-SVM gives a better sensitivity, while B-SVM gives a better ratio of sensitivity and specificity. Organizations can utilize this approach to balance training data automatically and simply and more efficiently select customers that should be targeted in the next direct marketing campaigns.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IPSI Transactions on Internet Research

自引率

0.00%

发文量