用于文档组织的频繁模式增长方法

Ontologies and Information Systems for the Semantic Web Pub Date : 2008-10-30 DOI:10.1145/1458484.1458496

Monika Akbar, R. Angryk

{"title":"用于文档组织的频繁模式增长方法","authors":"Monika Akbar, R. Angryk","doi":"10.1145/1458484.1458496","DOIUrl":null,"url":null,"abstract":"In this paper, we propose a document clustering mechanism that depends on the appearance of frequent senses in the documents rather than on the co-occurrence of frequent keywords. Instead of representing each document as a collection of keywords, we use a document-graph which reflects a conceptual hierarchy of keywords related to that document. We incorporate a graph mining approach with one of the well-known association rule mining procedures, FP-growth, to discover the frequent subgraphs among the document-graphs. The similarity of the documents is measured in terms of the number of frequent subgraphs appearing in the corresponding document-graphs. We believe that our novel approach allows us to cluster the documents based more on their senses rather than the actual keywords.","PeriodicalId":363359,"journal":{"name":"Ontologies and Information Systems for the Semantic Web","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":"{\"title\":\"Frequent pattern-growth approach for document organization\",\"authors\":\"Monika Akbar, R. Angryk\",\"doi\":\"10.1145/1458484.1458496\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we propose a document clustering mechanism that depends on the appearance of frequent senses in the documents rather than on the co-occurrence of frequent keywords. Instead of representing each document as a collection of keywords, we use a document-graph which reflects a conceptual hierarchy of keywords related to that document. We incorporate a graph mining approach with one of the well-known association rule mining procedures, FP-growth, to discover the frequent subgraphs among the document-graphs. The similarity of the documents is measured in terms of the number of frequent subgraphs appearing in the corresponding document-graphs. We believe that our novel approach allows us to cluster the documents based more on their senses rather than the actual keywords.\",\"PeriodicalId\":363359,\"journal\":{\"name\":\"Ontologies and Information Systems for the Semantic Web\",\"volume\":\"33 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2008-10-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"17\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Ontologies and Information Systems for the Semantic Web\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/1458484.1458496\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ontologies and Information Systems for the Semantic Web","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1458484.1458496","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 17

摘要

在本文中，我们提出了一种文档聚类机制，该机制依赖于文档中频繁感官的出现，而不是依赖于频繁关键词的共现。我们没有将每个文档表示为关键字集合，而是使用一个文档图，它反映了与该文档相关的关键字的概念层次结构。我们将图挖掘方法与著名的关联规则挖掘过程之一FP-growth相结合，以发现文档图中的频繁子图。文档的相似性是根据相应文档图中出现的频繁子图的数量来衡量的。我们相信，我们的新方法允许我们更多地基于它们的感官而不是实际的关键字来聚类文档。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Frequent pattern-growth approach for document organization

In this paper, we propose a document clustering mechanism that depends on the appearance of frequent senses in the documents rather than on the co-occurrence of frequent keywords. Instead of representing each document as a collection of keywords, we use a document-graph which reflects a conceptual hierarchy of keywords related to that document. We incorporate a graph mining approach with one of the well-known association rule mining procedures, FP-growth, to discover the frequent subgraphs among the document-graphs. The similarity of the documents is measured in terms of the number of frequent subgraphs appearing in the corresponding document-graphs. We believe that our novel approach allows us to cluster the documents based more on their senses rather than the actual keywords.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Ontologies and Information Systems for the Semantic Web

自引率

0.00%

发文量