用于文档组织的频繁模式增长方法

Monika Akbar, R. Angryk
{"title":"用于文档组织的频繁模式增长方法","authors":"Monika Akbar, R. Angryk","doi":"10.1145/1458484.1458496","DOIUrl":null,"url":null,"abstract":"In this paper, we propose a document clustering mechanism that depends on the appearance of frequent senses in the documents rather than on the co-occurrence of frequent keywords. Instead of representing each document as a collection of keywords, we use a document-graph which reflects a conceptual hierarchy of keywords related to that document. We incorporate a graph mining approach with one of the well-known association rule mining procedures, FP-growth, to discover the frequent subgraphs among the document-graphs. The similarity of the documents is measured in terms of the number of frequent subgraphs appearing in the corresponding document-graphs. We believe that our novel approach allows us to cluster the documents based more on their senses rather than the actual keywords.","PeriodicalId":363359,"journal":{"name":"Ontologies and Information Systems for the Semantic Web","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":"{\"title\":\"Frequent pattern-growth approach for document organization\",\"authors\":\"Monika Akbar, R. Angryk\",\"doi\":\"10.1145/1458484.1458496\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we propose a document clustering mechanism that depends on the appearance of frequent senses in the documents rather than on the co-occurrence of frequent keywords. Instead of representing each document as a collection of keywords, we use a document-graph which reflects a conceptual hierarchy of keywords related to that document. We incorporate a graph mining approach with one of the well-known association rule mining procedures, FP-growth, to discover the frequent subgraphs among the document-graphs. The similarity of the documents is measured in terms of the number of frequent subgraphs appearing in the corresponding document-graphs. We believe that our novel approach allows us to cluster the documents based more on their senses rather than the actual keywords.\",\"PeriodicalId\":363359,\"journal\":{\"name\":\"Ontologies and Information Systems for the Semantic Web\",\"volume\":\"33 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2008-10-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"17\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Ontologies and Information Systems for the Semantic Web\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/1458484.1458496\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ontologies and Information Systems for the Semantic Web","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1458484.1458496","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 17

摘要

在本文中,我们提出了一种文档聚类机制,该机制依赖于文档中频繁感官的出现,而不是依赖于频繁关键词的共现。我们没有将每个文档表示为关键字集合,而是使用一个文档图,它反映了与该文档相关的关键字的概念层次结构。我们将图挖掘方法与著名的关联规则挖掘过程之一FP-growth相结合,以发现文档图中的频繁子图。文档的相似性是根据相应文档图中出现的频繁子图的数量来衡量的。我们相信,我们的新方法允许我们更多地基于它们的感官而不是实际的关键字来聚类文档。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Frequent pattern-growth approach for document organization
In this paper, we propose a document clustering mechanism that depends on the appearance of frequent senses in the documents rather than on the co-occurrence of frequent keywords. Instead of representing each document as a collection of keywords, we use a document-graph which reflects a conceptual hierarchy of keywords related to that document. We incorporate a graph mining approach with one of the well-known association rule mining procedures, FP-growth, to discover the frequent subgraphs among the document-graphs. The similarity of the documents is measured in terms of the number of frequent subgraphs appearing in the corresponding document-graphs. We believe that our novel approach allows us to cluster the documents based more on their senses rather than the actual keywords.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信