Frequent pattern-growth approach for document organization

Ontologies and Information Systems for the Semantic Web Pub Date : 2008-10-30 DOI:10.1145/1458484.1458496

Monika Akbar, R. Angryk

引用次数: 17

Abstract

In this paper, we propose a document clustering mechanism that depends on the appearance of frequent senses in the documents rather than on the co-occurrence of frequent keywords. Instead of representing each document as a collection of keywords, we use a document-graph which reflects a conceptual hierarchy of keywords related to that document. We incorporate a graph mining approach with one of the well-known association rule mining procedures, FP-growth, to discover the frequent subgraphs among the document-graphs. The similarity of the documents is measured in terms of the number of frequent subgraphs appearing in the corresponding document-graphs. We believe that our novel approach allows us to cluster the documents based more on their senses rather than the actual keywords.

查看原文本刊更多论文

用于文档组织的频繁模式增长方法

在本文中，我们提出了一种文档聚类机制，该机制依赖于文档中频繁感官的出现，而不是依赖于频繁关键词的共现。我们没有将每个文档表示为关键字集合，而是使用一个文档图，它反映了与该文档相关的关键字的概念层次结构。我们将图挖掘方法与著名的关联规则挖掘过程之一FP-growth相结合，以发现文档图中的频繁子图。文档的相似性是根据相应文档图中出现的频繁子图的数量来衡量的。我们相信，我们的新方法允许我们更多地基于它们的感官而不是实际的关键字来聚类文档。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Ontologies and Information Systems for the Semantic Web

自引率

0.00%

发文量