基于k-means聚类的支持向量机精度改进

2013 International Computer Science and Engineering Conference (ICSEC) Pub Date : 2013-09-01 DOI:10.1109/ICSEC.2013.6694782

Teera Siriteerakul, V. Boonjing

{"title":"基于k-means聚类的支持向量机精度改进","authors":"Teera Siriteerakul, V. Boonjing","doi":"10.1109/ICSEC.2013.6694782","DOIUrl":null,"url":null,"abstract":"Support Vector Machine (SVM) is a classifier tool which, originally, uses a hyperplane as a border for separating two classes of data in hyperspace. However, if data from each class are not clustered together, the two classes might not be linearly separable. Typically, researchers attempted to resolve this issue by replacing the hyperplane with a complex border via kernel tricks. However, these kernel tricks could result in a longer training time or only a minute accuracy improvement (or both). On the other hand, if data from one class are separated into subclasses according to their proximity, then all the subclasses should be easily separated by hyperplanes. Therefore, this paper proposes a method to improve the accuracy of linear SVM by first applying k means clustering to each class of input data. Then, after clustered, a multi-classes linear SVM is trained using each subclass as a separate class. Thus, the trained SVM can identify any new input into a subclass which can be easily mapped to the correct class. To evaluate, the proposed method is experimentally used to classify images of Thai character where multiple fonts of characters can be taken as hidden clusters within classes. Empirically, the proposed method could achieve over 6% improvement from a linear SVM or SVMs with RBF or polynomial kernel.","PeriodicalId":191620,"journal":{"name":"2013 International Computer Science and Engineering Conference (ICSEC)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Support Vector Machine accuracy improvement with k-means clustering\",\"authors\":\"Teera Siriteerakul, V. Boonjing\",\"doi\":\"10.1109/ICSEC.2013.6694782\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Support Vector Machine (SVM) is a classifier tool which, originally, uses a hyperplane as a border for separating two classes of data in hyperspace. However, if data from each class are not clustered together, the two classes might not be linearly separable. Typically, researchers attempted to resolve this issue by replacing the hyperplane with a complex border via kernel tricks. However, these kernel tricks could result in a longer training time or only a minute accuracy improvement (or both). On the other hand, if data from one class are separated into subclasses according to their proximity, then all the subclasses should be easily separated by hyperplanes. Therefore, this paper proposes a method to improve the accuracy of linear SVM by first applying k means clustering to each class of input data. Then, after clustered, a multi-classes linear SVM is trained using each subclass as a separate class. Thus, the trained SVM can identify any new input into a subclass which can be easily mapped to the correct class. To evaluate, the proposed method is experimentally used to classify images of Thai character where multiple fonts of characters can be taken as hidden clusters within classes. Empirically, the proposed method could achieve over 6% improvement from a linear SVM or SVMs with RBF or polynomial kernel.\",\"PeriodicalId\":191620,\"journal\":{\"name\":\"2013 International Computer Science and Engineering Conference (ICSEC)\",\"volume\":\"41 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 International Computer Science and Engineering Conference (ICSEC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICSEC.2013.6694782\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 International Computer Science and Engineering Conference (ICSEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSEC.2013.6694782","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

支持向量机(SVM)是一种分类器工具，它最初使用超平面作为边界来分离超空间中的两类数据。然而，如果来自每个类的数据没有聚类在一起，这两个类可能不是线性可分的。通常，研究人员试图通过核技巧用复杂边界替换超平面来解决这个问题。然而，这些内核技巧可能会导致更长的训练时间或只有一分钟的准确性提高(或两者兼而有之)。另一方面，如果来自一个类的数据根据它们的接近程度被分成子类，那么所有的子类应该很容易被超平面分开。因此，本文提出了一种提高线性支持向量机准确率的方法，首先对每一类输入数据应用k均值聚类。然后，聚类后，使用每个子类作为一个单独的类来训练多类线性支持向量机。因此，训练后的支持向量机可以识别任何新的输入到子类中，子类可以很容易地映射到正确的类。为了验证该方法的有效性，实验将该方法用于泰语字符图像的分类，其中多个字符字体可以作为类中的隐藏聚类。经验表明，该方法比线性支持向量机或具有RBF或多项式核的支持向量机提高6%以上。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Support Vector Machine accuracy improvement with k-means clustering

Support Vector Machine (SVM) is a classifier tool which, originally, uses a hyperplane as a border for separating two classes of data in hyperspace. However, if data from each class are not clustered together, the two classes might not be linearly separable. Typically, researchers attempted to resolve this issue by replacing the hyperplane with a complex border via kernel tricks. However, these kernel tricks could result in a longer training time or only a minute accuracy improvement (or both). On the other hand, if data from one class are separated into subclasses according to their proximity, then all the subclasses should be easily separated by hyperplanes. Therefore, this paper proposes a method to improve the accuracy of linear SVM by first applying k means clustering to each class of input data. Then, after clustered, a multi-classes linear SVM is trained using each subclass as a separate class. Thus, the trained SVM can identify any new input into a subclass which can be easily mapped to the correct class. To evaluate, the proposed method is experimentally used to classify images of Thai character where multiple fonts of characters can be taken as hidden clusters within classes. Empirically, the proposed method could achieve over 6% improvement from a linear SVM or SVMs with RBF or polynomial kernel.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2013 International Computer Science and Engineering Conference (ICSEC)

自引率

0.00%

发文量