Automatic text categorization using a system of high-precision and high-recall models

2014 IEEE Symposium on Computational Intelligence and Data Mining (CIDM) Pub Date : 2014-12-01 DOI:10.1109/CIDM.2014.7008692

Dai Li, Y. Murphey

引用次数: 3

Abstract

This paper presents an automatic text document categorization system, HPHR. HPHR contains high precision, high recall and noise-filtered text categorization models. The text categorization models are generated through a suite of machine learning algorithms, a fast clustering algorithm that efficiently and effectively group documents into subcategories, and a text category generation algorithm that automatically generates text subcategories that represent high precision, high recall and noise-filtered text categorization models from a given set of training documents. The HPHR system was evaluated on documents drawn from two different applications, vehicle fault diagnostic documents, which are in a form of unstructured and verbatim text descriptions, and Reuters corpus. The performance of the proposed system, HPHR, on both document collections showed superiority over the systems commonly used in text document categorization.

查看原文本刊更多论文

使用高精度和高召回模型的自动文本分类系统

本文提出了一个自动文本文档分类系统HPHR。HPHR包含高精度、高召回率和噪声过滤的文本分类模型。文本分类模型是通过一套机器学习算法、一种快速聚类算法(有效地将文档分组为子类别)和一种文本类别生成算法(从给定的一组训练文档中自动生成代表高精度、高召回率和噪声过滤的文本分类模型的文本子类别)生成的。HPHR系统在两种不同应用程序的文档上进行了评估，一种是车辆故障诊断文档(以非结构化和逐字文本描述的形式)，另一种是路透社语料库。所提出的HPHR系统在这两种文档集合上的性能都优于通常用于文本文档分类的系统。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2014 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)

自引率

0.00%

发文量