Mining and ranking of generalized multi-dimensional frequent subgraphs

André Petermann, G. Micale, Giacomo Bergami, A. Pulvirenti, E. Rahm
{"title":"Mining and ranking of generalized multi-dimensional frequent subgraphs","authors":"André Petermann, G. Micale, Giacomo Bergami, A. Pulvirenti, E. Rahm","doi":"10.1109/ICDIM.2017.8244685","DOIUrl":null,"url":null,"abstract":"Frequent pattern mining is an important research field and can be applied to different labeled data structures ranging from itemsets to graphs. There are scenarios where a label can be assigned to a taxonomy and generalized patterns can be mined by replacing labels by their ancestors. In this work, we propose a novel approach to generalized frequent subgraph mining. In contrast to existing work, our approach considers new requirements from use cases beyond molecular databases. In particular, we support directed multigraphs as well as multiple taxonomies to deal with the different semantic meaning of vertices. Since results of generalized frequent subgraph mining can be very large, we use a fast analytical method of p-value estimation to rank results by significance. We propose two extensions of the popular gSpan algorithm that mine frequent subgraphs across all taxonomy levels. We compare both algorithms in an experimental evaluation based on a database of business process executions represented by graphs.","PeriodicalId":144953,"journal":{"name":"2017 Twelfth International Conference on Digital Information Management (ICDIM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 Twelfth International Conference on Digital Information Management (ICDIM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDIM.2017.8244685","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9

Abstract

Frequent pattern mining is an important research field and can be applied to different labeled data structures ranging from itemsets to graphs. There are scenarios where a label can be assigned to a taxonomy and generalized patterns can be mined by replacing labels by their ancestors. In this work, we propose a novel approach to generalized frequent subgraph mining. In contrast to existing work, our approach considers new requirements from use cases beyond molecular databases. In particular, we support directed multigraphs as well as multiple taxonomies to deal with the different semantic meaning of vertices. Since results of generalized frequent subgraph mining can be very large, we use a fast analytical method of p-value estimation to rank results by significance. We propose two extensions of the popular gSpan algorithm that mine frequent subgraphs across all taxonomy levels. We compare both algorithms in an experimental evaluation based on a database of business process executions represented by graphs.
广义多维频繁子图的挖掘与排序
频繁模式挖掘是一个重要的研究领域,可以应用于从项目集到图的各种标记数据结构。在某些情况下,可以将标签分配给一个分类法,并且可以通过用它们的祖先替换标签来挖掘通用模式。在这项工作中,我们提出了一种新的广义频繁子图挖掘方法。与现有的工作相比,我们的方法考虑了分子数据库以外用例的新需求。特别是,我们支持有向多图以及多个分类法来处理顶点的不同语义。由于广义频繁子图挖掘的结果可能非常大,我们使用p值估计的快速分析方法对结果进行显著性排序。我们提出了流行的gSpan算法的两个扩展,用于挖掘所有分类级别上的频繁子图。我们在一个基于图表表示的业务流程执行数据库的实验评估中比较了这两种算法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信