An Investigation on Linear SVM and its Variants for Text Categorization

2010 Second International Conference on Machine Learning and Computing Pub Date : 2010-02-09 DOI:10.1109/ICMLC.2010.64

M. A. Kumar, M. Gopal

引用次数: 13

Abstract

Linear Support Vector Machines (SVMs) have been used successfully to classify text documents into set of concepts. With the increasing number of linear SVM formulations and decomposition algorithms publicly available, this paper performs a study on their efficiency and efficacy for text categorization tasks. Eight publicly available implementations are investigated in terms of Break Even Point (BEP), F1 measure, ROC plots, learning speed and sensitivity to penalty parameter, based on the experimental results on two benchmark text corpuses. The results show that out of the eight implementations, SVMlin and Proximal SVM perform better in terms of consistent performance and reduced training time. However being an extremely simple algorithm with training time independent of the penalty parameter and the category for which training is being done, Proximal SVM is appealing. We further investigated fuzzy proximal SVM on both the text corpuses; it showed improved generalization over proximal SVM.

查看原文本刊更多论文

用于文本分类的线性支持向量机及其变体研究

线性支持向量机(svm)已被成功地用于将文本文档分类为概念集。随着线性支持向量机公式和分解算法的不断增加，本文对其在文本分类任务中的效率和效果进行了研究。基于两个基准文本语料库的实验结果，研究了8个公开可用的实现，包括盈亏平衡点(BEP)、F1测量、ROC图、学习速度和对惩罚参数的敏感性。结果表明，在8种实现中，SVM和Proximal SVM在一致性和减少训练时间方面表现更好。然而，作为一种极其简单的算法，训练时间与惩罚参数和训练的类别无关，Proximal SVM很有吸引力。我们进一步研究了模糊近端支持向量机在两种文本语料库上的应用;它比近端支持向量机的泛化效果更好。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2010 Second International Conference on Machine Learning and Computing

自引率

0.00%

发文量