A Novel Text Representation Model for Text Classification

Jun Wang, Yiming Zhou
{"title":"A Novel Text Representation Model for Text Classification","authors":"Jun Wang, Yiming Zhou","doi":"10.1109/ICINIS.2008.21","DOIUrl":null,"url":null,"abstract":"The text representation in text classification is usually a sequence of terms. As the number of terms becomes very high, it is greatly time-consuming to perform existed text categorization tasks. In this paper we presented a novel text representation model for text classification which greatly reduced the required resources. This model represents text with several features. Each feature corresponds to a theme that emerged from a set of related articles. We also introduce an efficient way to build the model. The proposed model has been applied to naive bayes classifier and experiments on Reuters-21578 corpus have shown that the efficiency is greatly improved without sacrificing classification accuracy even when the dimension of the input space is significantly reduced.","PeriodicalId":185739,"journal":{"name":"2008 First International Conference on Intelligent Networks and Intelligent Systems","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 First International Conference on Intelligent Networks and Intelligent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICINIS.2008.21","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

The text representation in text classification is usually a sequence of terms. As the number of terms becomes very high, it is greatly time-consuming to perform existed text categorization tasks. In this paper we presented a novel text representation model for text classification which greatly reduced the required resources. This model represents text with several features. Each feature corresponds to a theme that emerged from a set of related articles. We also introduce an efficient way to build the model. The proposed model has been applied to naive bayes classifier and experiments on Reuters-21578 corpus have shown that the efficiency is greatly improved without sacrificing classification accuracy even when the dimension of the input space is significantly reduced.
一种新的文本分类文本表示模型
文本分类中的文本表示通常是一个术语序列。由于词条的数量越来越多,执行现有的文本分类任务非常耗时。本文提出了一种用于文本分类的文本表示模型,大大减少了文本分类所需的资源。这个模型表示具有几个特征的文本。每个特性对应于从一组相关文章中产生的主题。我们还介绍了一种建立模型的有效方法。该模型已应用于朴素贝叶斯分类器,在Reuters-21578语料库上的实验表明,即使在输入空间维数显著降低的情况下,也能在不牺牲分类精度的情况下大大提高分类效率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信