Building Vietnamese Topic Modeling Based on Core Terms and Applying in Text Classification

2015 Fifth International Conference on Communication Systems and Network Technologies Pub Date : 2015-04-04 DOI:10.1109/CSNT.2015.22

Ha Nguyen Thi Thu, Tinh Dao Thanh, T. Hai, Vinh Ho Ngoc

引用次数: 8

Abstract

In the languages, the occur of words are indicated about meaning of contents in text. Generative models for text, such as the topic model, have the potential to make important contributions to the statistical analysis of large document collections, and the development of a deeper understanding of human language learning and processing. In this paper, we proposed a novel method for building Vietnamese topic model based on core terms and conditional probability. With this approach, we reduced cost of time for building corpus. After that, we perform with Vietnamese text classification and the experimental show that, this corpus will help text classification system really effectively than traditional methods, higher accuracy and reduced complex data processing.

查看原文本刊更多论文

基于核心术语的越南语主题建模及其在文本分类中的应用

在语言中，词语的出现表明了文本内容的意义。文本的生成模型，如主题模型，有可能对大型文档集合的统计分析做出重要贡献，并对人类语言学习和处理的深入理解做出重要贡献。本文提出了一种基于核心术语和条件概率的越南语话题模型构建方法。通过这种方法，我们减少了构建语料库的时间成本。在此基础上，对越南语文本分类进行了实验，实验表明，该语料库将比传统方法更有效地帮助文本分类系统，提高了准确率，减少了复杂的数据处理。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2015 Fifth International Conference on Communication Systems and Network Technologies

自引率

0.00%

发文量