Semi-supervised Text Classification Using SVM with Exponential Kernel

Liyun Zhong
{"title":"Semi-supervised Text Classification Using SVM with Exponential Kernel","authors":"Liyun Zhong","doi":"10.14257/IJDTA.2017.10.1.08","DOIUrl":null,"url":null,"abstract":"Kernel-based learning methods (kernel methods for short) in general and support vector machine (SVM) in particular have been successfully applied to the task of text classification. This is mainly due to their relatively high classification accuracy on several application domains as well as their ability to handle high dimensional and sparse data which is the prohibitive characteristics of textual data representation. A significant challenge in text classification is to reduce the need for labeled training data while maintaining an acceptable performance. This paper presents a semi-supervised technique using the exponential kernel for text classification. Specifically, the semantic similarities between terms are first determined with both labeled and unlabeled training data by means of a diffusion process on a graph defined by lexicon and co-occurrence information, and the exponential kernel is then constructed based on the learned semantic similarity. Finally, the SVM classifier trains a model for each class during the training phase and this model is then applied to all test examples in the test phase. The main feature of this approach is that it takes advantage of the exponential kernel to reveal the semantic similarities between terms in an unsupervised manner, which provides a kernel framework for semi-supervised learning. The proposed approach is demonstrated on several benchmark data sets for text classification and the experimental results show that it can significantly improve the classification performance.","PeriodicalId":13926,"journal":{"name":"International journal of database theory and application","volume":"49 1","pages":"79-88"},"PeriodicalIF":0.0000,"publicationDate":"2017-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International journal of database theory and application","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.14257/IJDTA.2017.10.1.08","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Kernel-based learning methods (kernel methods for short) in general and support vector machine (SVM) in particular have been successfully applied to the task of text classification. This is mainly due to their relatively high classification accuracy on several application domains as well as their ability to handle high dimensional and sparse data which is the prohibitive characteristics of textual data representation. A significant challenge in text classification is to reduce the need for labeled training data while maintaining an acceptable performance. This paper presents a semi-supervised technique using the exponential kernel for text classification. Specifically, the semantic similarities between terms are first determined with both labeled and unlabeled training data by means of a diffusion process on a graph defined by lexicon and co-occurrence information, and the exponential kernel is then constructed based on the learned semantic similarity. Finally, the SVM classifier trains a model for each class during the training phase and this model is then applied to all test examples in the test phase. The main feature of this approach is that it takes advantage of the exponential kernel to reveal the semantic similarities between terms in an unsupervised manner, which provides a kernel framework for semi-supervised learning. The proposed approach is demonstrated on several benchmark data sets for text classification and the experimental results show that it can significantly improve the classification performance.
基于指数核的SVM半监督文本分类
基于核的学习方法(简称核方法),特别是支持向量机(SVM)已经成功地应用于文本分类任务。这主要是由于它们在几个应用领域的分类精度相对较高,以及它们处理高维和稀疏数据的能力,这是文本数据表示的禁忌特征。文本分类的一个重大挑战是在保持可接受的性能的同时减少对标记训练数据的需求。本文提出了一种利用指数核进行文本分类的半监督技术。具体而言,首先在由词汇和共现信息定义的图上通过扩散过程确定标记和未标记训练数据之间的语义相似度,然后基于学习到的语义相似度构造指数核。最后,SVM分类器在训练阶段为每个类别训练一个模型,然后将该模型应用于测试阶段的所有测试样例。该方法的主要特点是利用指数核以无监督的方式揭示术语之间的语义相似性,为半监督学习提供了一个核框架。在多个文本分类基准数据集上进行了验证,实验结果表明该方法能显著提高分类性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信