A Novel Text Representation Model for Text Classification

2008 First International Conference on Intelligent Networks and Intelligent Systems Pub Date : 2008-11-01 DOI:10.1109/ICINIS.2008.21

Jun Wang, Yiming Zhou

引用次数: 1

Abstract

The text representation in text classification is usually a sequence of terms. As the number of terms becomes very high, it is greatly time-consuming to perform existed text categorization tasks. In this paper we presented a novel text representation model for text classification which greatly reduced the required resources. This model represents text with several features. Each feature corresponds to a theme that emerged from a set of related articles. We also introduce an efficient way to build the model. The proposed model has been applied to naive bayes classifier and experiments on Reuters-21578 corpus have shown that the efficiency is greatly improved without sacrificing classification accuracy even when the dimension of the input space is significantly reduced.

查看原文本刊更多论文

一种新的文本分类文本表示模型

文本分类中的文本表示通常是一个术语序列。由于词条的数量越来越多，执行现有的文本分类任务非常耗时。本文提出了一种用于文本分类的文本表示模型，大大减少了文本分类所需的资源。这个模型表示具有几个特征的文本。每个特性对应于从一组相关文章中产生的主题。我们还介绍了一种建立模型的有效方法。该模型已应用于朴素贝叶斯分类器，在Reuters-21578语料库上的实验表明，即使在输入空间维数显著降低的情况下，也能在不牺牲分类精度的情况下大大提高分类效率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2008 First International Conference on Intelligent Networks and Intelligent Systems

自引率

0.00%

发文量