{"title":"基于核采样的支持向量机训练方法","authors":"Zhi-qiang Zeng, Shunzhi Zhu","doi":"10.1109/ANTHOLOGY.2013.6784693","DOIUrl":null,"url":null,"abstract":"Out-class sampling working together with in-class sampling is a popular strategy to train Support Vector Machine (SVM) classifier with imbalanced data sets. However, it may lead to some inconsistency because the sampling strategy and SVM work in different space. This paper presents a kernel-based over-sampling approach to overcome the drawback. The method first preprocesses the data using both in-class and out-class sampling to generate minority instances in the feature space, then the pre-images of the synthetic samples are found based on a distance relation between input space and feature space. Finally, these pre-images are appended to the original minority class data set to train a SVM. Experiments on real data sets indicate that compared with existing over-sampling technique, the samples generated by the proposed strategy have the higher quality. As a result, the effectiveness of classification by SVM with imbalanced data sets is improved.","PeriodicalId":203169,"journal":{"name":"IEEE Conference Anthology","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"A kernel-based sampling to train SVM with imbalanced data set\",\"authors\":\"Zhi-qiang Zeng, Shunzhi Zhu\",\"doi\":\"10.1109/ANTHOLOGY.2013.6784693\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Out-class sampling working together with in-class sampling is a popular strategy to train Support Vector Machine (SVM) classifier with imbalanced data sets. However, it may lead to some inconsistency because the sampling strategy and SVM work in different space. This paper presents a kernel-based over-sampling approach to overcome the drawback. The method first preprocesses the data using both in-class and out-class sampling to generate minority instances in the feature space, then the pre-images of the synthetic samples are found based on a distance relation between input space and feature space. Finally, these pre-images are appended to the original minority class data set to train a SVM. Experiments on real data sets indicate that compared with existing over-sampling technique, the samples generated by the proposed strategy have the higher quality. As a result, the effectiveness of classification by SVM with imbalanced data sets is improved.\",\"PeriodicalId\":203169,\"journal\":{\"name\":\"IEEE Conference Anthology\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Conference Anthology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ANTHOLOGY.2013.6784693\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Conference Anthology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ANTHOLOGY.2013.6784693","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A kernel-based sampling to train SVM with imbalanced data set
Out-class sampling working together with in-class sampling is a popular strategy to train Support Vector Machine (SVM) classifier with imbalanced data sets. However, it may lead to some inconsistency because the sampling strategy and SVM work in different space. This paper presents a kernel-based over-sampling approach to overcome the drawback. The method first preprocesses the data using both in-class and out-class sampling to generate minority instances in the feature space, then the pre-images of the synthetic samples are found based on a distance relation between input space and feature space. Finally, these pre-images are appended to the original minority class data set to train a SVM. Experiments on real data sets indicate that compared with existing over-sampling technique, the samples generated by the proposed strategy have the higher quality. As a result, the effectiveness of classification by SVM with imbalanced data sets is improved.