A kernel-based sampling to train SVM with imbalanced data set

IEEE Conference Anthology Pub Date : 1900-01-01 DOI:10.1109/ANTHOLOGY.2013.6784693

Zhi-qiang Zeng, Shunzhi Zhu

引用次数: 4

Abstract

Out-class sampling working together with in-class sampling is a popular strategy to train Support Vector Machine (SVM) classifier with imbalanced data sets. However, it may lead to some inconsistency because the sampling strategy and SVM work in different space. This paper presents a kernel-based over-sampling approach to overcome the drawback. The method first preprocesses the data using both in-class and out-class sampling to generate minority instances in the feature space, then the pre-images of the synthetic samples are found based on a distance relation between input space and feature space. Finally, these pre-images are appended to the original minority class data set to train a SVM. Experiments on real data sets indicate that compared with existing over-sampling technique, the samples generated by the proposed strategy have the higher quality. As a result, the effectiveness of classification by SVM with imbalanced data sets is improved.

查看原文本刊更多论文

基于核采样的支持向量机训练方法

类外采样与类内采样协同工作是在不平衡数据集下训练支持向量机(SVM)分类器的常用策略。然而，由于采样策略和支持向量机工作在不同的空间，可能会导致一些不一致。本文提出了一种基于核的过采样方法来克服这一缺点。该方法首先使用类内采样和类外采样对数据进行预处理，在特征空间中生成少数样本，然后根据输入空间和特征空间之间的距离关系找到合成样本的预图像。最后，将这些预图像附加到原始的少数类数据集上以训练支持向量机。在真实数据集上的实验表明，与现有的过采样技术相比，该策略生成的样本具有更高的质量。从而提高了支持向量机对不平衡数据集进行分类的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Conference Anthology

自引率

0.00%

发文量