Leverage Classifier: Another Look at Support Vector Machine

IF 1.2 3区数学 Q2 STATISTICS & PROBABILITY

Statistica Sinica Pub Date : 2023-08-23 DOI:10.5705/ss.202023.0124

Yixin Han, Jun Yu, Nan Zhang, Cheng Meng, Ping Ma, Wenxuan Zhong, Changliang Zou

引用次数: 0

Abstract

Support vector machine (SVM) is a popular classifier known for accuracy, flexibility, and robustness. However, its intensive computation has hindered its application to large-scale datasets. In this paper, we propose a new optimal leverage classifier based on linear SVM under a nonseparable setting. Our classifier aims to select an informative subset of the training sample to reduce data size, enabling efficient computation while maintaining high accuracy. We take a novel view of SVM under the general subsampling framework and rigorously investigate the statistical properties. We propose a two-step subsampling procedure consisting of a pilot estimation of the optimal subsampling probabilities and a subsampling step to construct the classifier. We develop a new Bahadur representation of the SVM coefficients and derive unconditional asymptotic distribution and optimal subsampling probabilities without giving the full sample. Numerical results demonstrate that our classifiers outperform the existing methods in terms of estimation, computation, and prediction.

查看原文本刊更多论文

杠杆分类器:另看支持向量机

支持向量机（SVM）是一种流行的分类器，以其准确性、灵活性和鲁棒性而闻名。然而，其密集的计算阻碍了其在大规模数据集中的应用。在本文中，我们提出了一种新的基于线性SVM的不可分离设置下的最优杠杆分类器。我们的分类器旨在选择训练样本的信息子集，以减少数据大小，在保持高精度的同时实现高效计算。我们在一般的子采样框架下对支持向量机提出了一种新的观点，并严格研究了其统计特性。我们提出了一种两步子采样过程，包括最优子采样概率的导频估计和构造分类器的子采样步骤。我们开发了SVM系数的新的Bahadur表示，并在不给出全样本的情况下导出了无条件渐近分布和最优子采样概率。数值结果表明，我们的分类器在估计、计算和预测方面优于现有的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Statistica Sinica 数学-统计学与概率论

CiteScore

2.10

自引率

0.00%

发文量

审稿时长

10.5 months

期刊介绍： Statistica Sinica aims to meet the needs of statisticians in a rapidly changing world. It provides a forum for the publication of innovative work of high quality in all areas of statistics, including theory, methodology and applications. The journal encourages the development and principled use of statistical methodology that is relevant for society, science and technology.