Efficient Training Data Reduction for SVM based Handwritten Digits Recognition

2007 International Conference on Electrical Engineering Pub Date : 2007-04-11 DOI:10.1109/ICEE.2007.4287360

I. Javed, M. N. Ayyaz, W. Mehmood

{"title":"Efficient Training Data Reduction for SVM based Handwritten Digits Recognition","authors":"I. Javed, M. N. Ayyaz, W. Mehmood","doi":"10.1109/ICEE.2007.4287360","DOIUrl":null,"url":null,"abstract":"Support vector machine (SVM) are binary classifiers that make any two classes linearly separable by finding a maximum-margin hyper-plane between the data samples of the two classes in a given feature space. Once the discrimination function of this hyper-plane has been found during the training stage, any unknown sample can be classified by checking the sign of this discrimination function for the unknown sample. It is well understood in SVM theory that the equation of SVM discrimination function is largely determined by data points close to the decision boundary. These data points close to the decision boundary are called as support vectors (SV). SVM training process for large data sets is often a time consuming process. Hence reducing the original data to contain only the SVs is a useful goal for speeding up the training process. This reduction of training data should not affect the accuracy of SVM classifier. In this paper, we propose an efficient training data reduction algorithm (Peer-SV) for SVM classifiers. The algorithm is based on the observation that the desired support vectors are those data points which are of opposite classes and whose diametric sphere does not contain any other class instance of the two classes. We have found these SVs in an efficient way i.e. computing the SVs between the peer classes only and removing the farthest points earlier to retain the border points. The algorithm has been tested on handwritten digits data sets. The results obtained on the total data and on the reduced data shows the accuracy of the adopted approach.","PeriodicalId":291800,"journal":{"name":"2007 International Conference on Electrical Engineering","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2007 International Conference on Electrical Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICEE.2007.4287360","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 10

Abstract

Support vector machine (SVM) are binary classifiers that make any two classes linearly separable by finding a maximum-margin hyper-plane between the data samples of the two classes in a given feature space. Once the discrimination function of this hyper-plane has been found during the training stage, any unknown sample can be classified by checking the sign of this discrimination function for the unknown sample. It is well understood in SVM theory that the equation of SVM discrimination function is largely determined by data points close to the decision boundary. These data points close to the decision boundary are called as support vectors (SV). SVM training process for large data sets is often a time consuming process. Hence reducing the original data to contain only the SVs is a useful goal for speeding up the training process. This reduction of training data should not affect the accuracy of SVM classifier. In this paper, we propose an efficient training data reduction algorithm (Peer-SV) for SVM classifiers. The algorithm is based on the observation that the desired support vectors are those data points which are of opposite classes and whose diametric sphere does not contain any other class instance of the two classes. We have found these SVs in an efficient way i.e. computing the SVs between the peer classes only and removing the farthest points earlier to retain the border points. The algorithm has been tested on handwritten digits data sets. The results obtained on the total data and on the reduced data shows the accuracy of the adopted approach.

查看原文本刊更多论文

基于SVM的手写体数字识别的高效训练数据约简

支持向量机(SVM)是一种二元分类器，它通过在给定的特征空间中找到两个类的数据样本之间的最大边界超平面，使任意两个类线性可分。一旦在训练阶段找到了这个超平面的判别函数，就可以通过检查这个判别函数对未知样本的符号来分类任何未知样本。在支持向量机理论中，支持向量机判别函数的方程在很大程度上取决于靠近决策边界的数据点。这些靠近决策边界的数据点被称为支持向量(SV)。支持向量机对大数据集的训练过程往往是一个耗时的过程。因此，减少原始数据以只包含SVs是加快训练过程的有用目标。这种训练数据的减少应该不会影响SVM分类器的准确性。本文提出了一种有效的SVM分类器训练数据约简算法(Peer-SV)。该算法是基于这样一种观察，即期望的支持向量是那些相对类别的数据点，并且其直径球面不包含这两个类别的任何其他类别实例。我们已经找到了一种有效的方法，即只计算对等类之间的SVs，并提前删除最远的点以保留边界点。该算法已在手写数字数据集上进行了测试。在总数据和约简数据上的结果表明了所采用方法的准确性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2007 International Conference on Electrical Engineering

自引率

0.00%

发文量