使用基于随机和聚类的欠采样及 SVM 解决客户响应建模中的类别不平衡问题

Ljiljana Kašćelan, Sunčica Vuković
{"title":"使用基于随机和聚类的欠采样及 SVM 解决客户响应建模中的类别不平衡问题","authors":"Ljiljana Kašćelan, Sunčica Vuković","doi":"10.58245/ipsi.tir.2402.08","DOIUrl":null,"url":null,"abstract":"The main challenge in machine learning-based customer response models is the class imbalance problem, i.e. small number of respondents, compared to non-respondents. Aiming to overcome this issue, the approach of preprocessing training data using a Support Vector Machine (SVM), trained on a balanced sample obtained by random undersampling (B-SVM), as well as on a balanced sample obtained by clustering-based undersampling (CB-SVM) was tested. Several classifiers are tested on such a balanced dataset, to compare their predictive performances. The results of this paper demonstrate that the approach effectively preprocesses the training data, and, in turn, reduces noise and overcomes the class imbalance problem. Better predictive performance was achieved compared to standard training data balancing techniques such as undersampling and SMOTE. CB-SVM gives a better sensitivity, while B-SVM gives a better ratio of sensitivity and specificity. Organizations can utilize this approach to balance training data automatically and simply and more efficiently select customers that should be targeted in the next direct marketing campaigns.","PeriodicalId":516644,"journal":{"name":"IPSI Transactions on Internet Research","volume":"65 2","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Addressing Class Imbalance in Customer Response Modeling Using Random and Clustering-Based Undersampling and SVM\",\"authors\":\"Ljiljana Kašćelan, Sunčica Vuković\",\"doi\":\"10.58245/ipsi.tir.2402.08\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The main challenge in machine learning-based customer response models is the class imbalance problem, i.e. small number of respondents, compared to non-respondents. Aiming to overcome this issue, the approach of preprocessing training data using a Support Vector Machine (SVM), trained on a balanced sample obtained by random undersampling (B-SVM), as well as on a balanced sample obtained by clustering-based undersampling (CB-SVM) was tested. Several classifiers are tested on such a balanced dataset, to compare their predictive performances. The results of this paper demonstrate that the approach effectively preprocesses the training data, and, in turn, reduces noise and overcomes the class imbalance problem. Better predictive performance was achieved compared to standard training data balancing techniques such as undersampling and SMOTE. CB-SVM gives a better sensitivity, while B-SVM gives a better ratio of sensitivity and specificity. Organizations can utilize this approach to balance training data automatically and simply and more efficiently select customers that should be targeted in the next direct marketing campaigns.\",\"PeriodicalId\":516644,\"journal\":{\"name\":\"IPSI Transactions on Internet Research\",\"volume\":\"65 2\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IPSI Transactions on Internet Research\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.58245/ipsi.tir.2402.08\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IPSI Transactions on Internet Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.58245/ipsi.tir.2402.08","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

基于机器学习的客户响应模型面临的主要挑战是类不平衡问题,即与非响应者相比,响应者数量较少。为了克服这一问题,我们测试了使用支持向量机(SVM)对训练数据进行预处理的方法,该方法是在通过随机欠采样(B-SVM)获得的平衡样本以及通过基于聚类的欠采样(CB-SVM)获得的平衡样本上进行训练的。在这样一个平衡数据集上测试了几种分类器,以比较它们的预测性能。本文的结果表明,该方法有效地预处理了训练数据,进而减少了噪声,克服了类不平衡问题。与标准的训练数据平衡技术(如欠采样和 SMOTE)相比,CB-SVM 取得了更好的预测性能。CB-SVM 的灵敏度更高,而 B-SVM 的灵敏度和特异性比率更高。企业可以利用这种方法自动平衡训练数据,更简单、更高效地选择下一次直销活动的目标客户。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Addressing Class Imbalance in Customer Response Modeling Using Random and Clustering-Based Undersampling and SVM
The main challenge in machine learning-based customer response models is the class imbalance problem, i.e. small number of respondents, compared to non-respondents. Aiming to overcome this issue, the approach of preprocessing training data using a Support Vector Machine (SVM), trained on a balanced sample obtained by random undersampling (B-SVM), as well as on a balanced sample obtained by clustering-based undersampling (CB-SVM) was tested. Several classifiers are tested on such a balanced dataset, to compare their predictive performances. The results of this paper demonstrate that the approach effectively preprocesses the training data, and, in turn, reduces noise and overcomes the class imbalance problem. Better predictive performance was achieved compared to standard training data balancing techniques such as undersampling and SMOTE. CB-SVM gives a better sensitivity, while B-SVM gives a better ratio of sensitivity and specificity. Organizations can utilize this approach to balance training data automatically and simply and more efficiently select customers that should be targeted in the next direct marketing campaigns.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信