An Investigation on Linear SVM and its Variants for Text Categorization

M. A. Kumar, M. Gopal
{"title":"An Investigation on Linear SVM and its Variants for Text Categorization","authors":"M. A. Kumar, M. Gopal","doi":"10.1109/ICMLC.2010.64","DOIUrl":null,"url":null,"abstract":"Linear Support Vector Machines (SVMs) have been used successfully to classify text documents into set of concepts. With the increasing number of linear SVM formulations and decomposition algorithms publicly available, this paper performs a study on their efficiency and efficacy for text categorization tasks. Eight publicly available implementations are investigated in terms of Break Even Point (BEP), F1 measure, ROC plots, learning speed and sensitivity to penalty parameter, based on the experimental results on two benchmark text corpuses. The results show that out of the eight implementations, SVMlin and Proximal SVM perform better in terms of consistent performance and reduced training time. However being an extremely simple algorithm with training time independent of the penalty parameter and the category for which training is being done, Proximal SVM is appealing. We further investigated fuzzy proximal SVM on both the text corpuses; it showed improved generalization over proximal SVM.","PeriodicalId":423912,"journal":{"name":"2010 Second International Conference on Machine Learning and Computing","volume":"38 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 Second International Conference on Machine Learning and Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLC.2010.64","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 13

Abstract

Linear Support Vector Machines (SVMs) have been used successfully to classify text documents into set of concepts. With the increasing number of linear SVM formulations and decomposition algorithms publicly available, this paper performs a study on their efficiency and efficacy for text categorization tasks. Eight publicly available implementations are investigated in terms of Break Even Point (BEP), F1 measure, ROC plots, learning speed and sensitivity to penalty parameter, based on the experimental results on two benchmark text corpuses. The results show that out of the eight implementations, SVMlin and Proximal SVM perform better in terms of consistent performance and reduced training time. However being an extremely simple algorithm with training time independent of the penalty parameter and the category for which training is being done, Proximal SVM is appealing. We further investigated fuzzy proximal SVM on both the text corpuses; it showed improved generalization over proximal SVM.
用于文本分类的线性支持向量机及其变体研究
线性支持向量机(svm)已被成功地用于将文本文档分类为概念集。随着线性支持向量机公式和分解算法的不断增加,本文对其在文本分类任务中的效率和效果进行了研究。基于两个基准文本语料库的实验结果,研究了8个公开可用的实现,包括盈亏平衡点(BEP)、F1测量、ROC图、学习速度和对惩罚参数的敏感性。结果表明,在8种实现中,SVM和Proximal SVM在一致性和减少训练时间方面表现更好。然而,作为一种极其简单的算法,训练时间与惩罚参数和训练的类别无关,Proximal SVM很有吸引力。我们进一步研究了模糊近端支持向量机在两种文本语料库上的应用;它比近端支持向量机的泛化效果更好。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信