Mining and ranking of generalized multi-dimensional frequent subgraphs

2017 Twelfth International Conference on Digital Information Management (ICDIM) Pub Date : 2017-09-01 DOI:10.1109/ICDIM.2017.8244685

André Petermann, G. Micale, Giacomo Bergami, A. Pulvirenti, E. Rahm

{"title":"Mining and ranking of generalized multi-dimensional frequent subgraphs","authors":"André Petermann, G. Micale, Giacomo Bergami, A. Pulvirenti, E. Rahm","doi":"10.1109/ICDIM.2017.8244685","DOIUrl":null,"url":null,"abstract":"Frequent pattern mining is an important research field and can be applied to different labeled data structures ranging from itemsets to graphs. There are scenarios where a label can be assigned to a taxonomy and generalized patterns can be mined by replacing labels by their ancestors. In this work, we propose a novel approach to generalized frequent subgraph mining. In contrast to existing work, our approach considers new requirements from use cases beyond molecular databases. In particular, we support directed multigraphs as well as multiple taxonomies to deal with the different semantic meaning of vertices. Since results of generalized frequent subgraph mining can be very large, we use a fast analytical method of p-value estimation to rank results by significance. We propose two extensions of the popular gSpan algorithm that mine frequent subgraphs across all taxonomy levels. We compare both algorithms in an experimental evaluation based on a database of business process executions represented by graphs.","PeriodicalId":144953,"journal":{"name":"2017 Twelfth International Conference on Digital Information Management (ICDIM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 Twelfth International Conference on Digital Information Management (ICDIM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDIM.2017.8244685","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

Abstract

Frequent pattern mining is an important research field and can be applied to different labeled data structures ranging from itemsets to graphs. There are scenarios where a label can be assigned to a taxonomy and generalized patterns can be mined by replacing labels by their ancestors. In this work, we propose a novel approach to generalized frequent subgraph mining. In contrast to existing work, our approach considers new requirements from use cases beyond molecular databases. In particular, we support directed multigraphs as well as multiple taxonomies to deal with the different semantic meaning of vertices. Since results of generalized frequent subgraph mining can be very large, we use a fast analytical method of p-value estimation to rank results by significance. We propose two extensions of the popular gSpan algorithm that mine frequent subgraphs across all taxonomy levels. We compare both algorithms in an experimental evaluation based on a database of business process executions represented by graphs.

查看原文本刊更多论文

广义多维频繁子图的挖掘与排序

频繁模式挖掘是一个重要的研究领域，可以应用于从项目集到图的各种标记数据结构。在某些情况下，可以将标签分配给一个分类法，并且可以通过用它们的祖先替换标签来挖掘通用模式。在这项工作中，我们提出了一种新的广义频繁子图挖掘方法。与现有的工作相比，我们的方法考虑了分子数据库以外用例的新需求。特别是，我们支持有向多图以及多个分类法来处理顶点的不同语义。由于广义频繁子图挖掘的结果可能非常大，我们使用p值估计的快速分析方法对结果进行显著性排序。我们提出了流行的gSpan算法的两个扩展，用于挖掘所有分类级别上的频繁子图。我们在一个基于图表表示的业务流程执行数据库的实验评估中比较了这两种算法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2017 Twelfth International Conference on Digital Information Management (ICDIM)

自引率

0.00%

发文量