Concept boundary detection for speeding up SVMs

Proceedings of the 23rd international conference on Machine learning Pub Date : 2006-06-25 DOI:10.1145/1143844.1143930

Navneet Panda, E. Chang, Gang Wu

引用次数: 59

Abstract

Support Vector Machines (SVMs) suffer from an O(n2) training cost, where n denotes the number of training instances. In this paper, we propose an algorithm to select boundary instances as training data to substantially reduce n. Our proposed algorithm is motivated by the result of (Burges, 1999) that, removing non-support vectors from the training set does not change SVM training results. Our algorithm eliminates instances that are likely to be non-support vectors. In the concept-independent preprocessing step of our algorithm, we prepare nearest-neighbor lists for training instances. In the concept-specific sampling step, we can then effectively select useful training data for each target concept. Empirical studies show our algorithm to be effective in reducing n, outperforming other competing downsampling algorithms without significantly compromising testing accuracy.

查看原文本刊更多论文

加速支持向量机的概念边界检测

支持向量机(svm)的训练成本为O(n2)，其中n表示训练实例的数量。在本文中，我们提出了一种选择边界实例作为训练数据的算法，以大幅减少n。我们提出的算法的动机是(Burges, 1999)的结果，即从训练集中删除非支持向量不会改变SVM的训练结果。我们的算法消除了可能是非支持向量的实例。在与概念无关的预处理步骤中，我们为训练实例准备了最近邻列表。在特定于概念的采样步骤中，我们可以有效地为每个目标概念选择有用的训练数据。实证研究表明，我们的算法在减少n方面是有效的，优于其他竞争的下采样算法，而不会显著影响测试精度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 23rd international conference on Machine learning

自引率

0.00%

发文量