{"title":"An Investigation on Linear SVM and its Variants for Text Categorization","authors":"M. A. Kumar, M. Gopal","doi":"10.1109/ICMLC.2010.64","DOIUrl":null,"url":null,"abstract":"Linear Support Vector Machines (SVMs) have been used successfully to classify text documents into set of concepts. With the increasing number of linear SVM formulations and decomposition algorithms publicly available, this paper performs a study on their efficiency and efficacy for text categorization tasks. Eight publicly available implementations are investigated in terms of Break Even Point (BEP), F1 measure, ROC plots, learning speed and sensitivity to penalty parameter, based on the experimental results on two benchmark text corpuses. The results show that out of the eight implementations, SVMlin and Proximal SVM perform better in terms of consistent performance and reduced training time. However being an extremely simple algorithm with training time independent of the penalty parameter and the category for which training is being done, Proximal SVM is appealing. We further investigated fuzzy proximal SVM on both the text corpuses; it showed improved generalization over proximal SVM.","PeriodicalId":423912,"journal":{"name":"2010 Second International Conference on Machine Learning and Computing","volume":"38 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 Second International Conference on Machine Learning and Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLC.2010.64","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 13
Abstract
Linear Support Vector Machines (SVMs) have been used successfully to classify text documents into set of concepts. With the increasing number of linear SVM formulations and decomposition algorithms publicly available, this paper performs a study on their efficiency and efficacy for text categorization tasks. Eight publicly available implementations are investigated in terms of Break Even Point (BEP), F1 measure, ROC plots, learning speed and sensitivity to penalty parameter, based on the experimental results on two benchmark text corpuses. The results show that out of the eight implementations, SVMlin and Proximal SVM perform better in terms of consistent performance and reduced training time. However being an extremely simple algorithm with training time independent of the penalty parameter and the category for which training is being done, Proximal SVM is appealing. We further investigated fuzzy proximal SVM on both the text corpuses; it showed improved generalization over proximal SVM.