信息网络Top-K有趣子图发现

2014 IEEE 30th International Conference on Data Engineering Pub Date : 2014-03-01 DOI:10.1109/ICDE.2014.6816703

Manish Gupta, Jing Gao, Xifeng Yan, H. Çam, Jiawei Han

{"title":"信息网络Top-K有趣子图发现","authors":"Manish Gupta, Jing Gao, Xifeng Yan, H. Çam, Jiawei Han","doi":"10.1109/ICDE.2014.6816703","DOIUrl":null,"url":null,"abstract":"In the real world, various systems can be modeled using heterogeneous networks which consist of entities of different types. Many problems on such networks can be mapped to an underlying critical problem of discovering top-K subgraphs of entities with rare and surprising associations. Answering such subgraph queries efficiently involves two main challenges: (1) computing all matching subgraphs which satisfy the query and (2) ranking such results based on the rarity and the interestingness of the associations among entities in the subgraphs. Previous work on the matching problem can be harnessed for a naïve ranking-after-matching solution. However, for large graphs, subgraph queries may have enormous number of matches, and so it is inefficient to compute all matches when only the top-K matches are desired. In this paper, we address the two challenges of matching and ranking in top-K subgraph discovery as follows. First, we introduce two index structures for the network: topology index, and graph maximum metapath weight index, which are both computed offline. Second, we propose novel top-K mechanisms to exploit these indexes for answering interesting subgraph queries online efficiently. Experimental results on several synthetic datasets and the DBLP and Wikipedia datasets containing thousands of entities show the efficiency and the effectiveness of the proposed approach in computing interesting subgraphs.","PeriodicalId":159130,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"64","resultStr":"{\"title\":\"Top-K interesting subgraph discovery in information networks\",\"authors\":\"Manish Gupta, Jing Gao, Xifeng Yan, H. Çam, Jiawei Han\",\"doi\":\"10.1109/ICDE.2014.6816703\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the real world, various systems can be modeled using heterogeneous networks which consist of entities of different types. Many problems on such networks can be mapped to an underlying critical problem of discovering top-K subgraphs of entities with rare and surprising associations. Answering such subgraph queries efficiently involves two main challenges: (1) computing all matching subgraphs which satisfy the query and (2) ranking such results based on the rarity and the interestingness of the associations among entities in the subgraphs. Previous work on the matching problem can be harnessed for a naïve ranking-after-matching solution. However, for large graphs, subgraph queries may have enormous number of matches, and so it is inefficient to compute all matches when only the top-K matches are desired. In this paper, we address the two challenges of matching and ranking in top-K subgraph discovery as follows. First, we introduce two index structures for the network: topology index, and graph maximum metapath weight index, which are both computed offline. Second, we propose novel top-K mechanisms to exploit these indexes for answering interesting subgraph queries online efficiently. Experimental results on several synthetic datasets and the DBLP and Wikipedia datasets containing thousands of entities show the efficiency and the effectiveness of the proposed approach in computing interesting subgraphs.\",\"PeriodicalId\":159130,\"journal\":{\"name\":\"2014 IEEE 30th International Conference on Data Engineering\",\"volume\":\"6 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"64\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 IEEE 30th International Conference on Data Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDE.2014.6816703\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE 30th International Conference on Data Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDE.2014.6816703","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 64

摘要

在现实世界中，可以使用由不同类型实体组成的异构网络对各种系统进行建模。这种网络上的许多问题可以映射到一个潜在的关键问题，即发现具有罕见和惊人关联的实体的top-K子图。有效地回答这些子图查询涉及两个主要挑战:(1)计算满足查询的所有匹配子图;(2)根据子图中实体之间关联的稀有性和兴趣度对这些结果进行排序。以前在匹配问题上的工作可以用于naïve匹配后排名解决方案。然而，对于大型图，子图查询可能有大量的匹配，因此，当只需要前k个匹配时，计算所有匹配是低效的。在本文中，我们解决了top-K子图发现中的匹配和排序两个挑战。首先，我们引入了网络的两种索引结构:拓扑索引和图最大元路径权重索引，它们都是离线计算的。其次，我们提出了新的top-K机制来利用这些索引在线有效地回答感兴趣的子图查询。在多个合成数据集以及包含数千个实体的DBLP和Wikipedia数据集上的实验结果表明了该方法在计算兴趣子图方面的效率和有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Top-K interesting subgraph discovery in information networks

In the real world, various systems can be modeled using heterogeneous networks which consist of entities of different types. Many problems on such networks can be mapped to an underlying critical problem of discovering top-K subgraphs of entities with rare and surprising associations. Answering such subgraph queries efficiently involves two main challenges: (1) computing all matching subgraphs which satisfy the query and (2) ranking such results based on the rarity and the interestingness of the associations among entities in the subgraphs. Previous work on the matching problem can be harnessed for a naïve ranking-after-matching solution. However, for large graphs, subgraph queries may have enormous number of matches, and so it is inefficient to compute all matches when only the top-K matches are desired. In this paper, we address the two challenges of matching and ranking in top-K subgraph discovery as follows. First, we introduce two index structures for the network: topology index, and graph maximum metapath weight index, which are both computed offline. Second, we propose novel top-K mechanisms to exploit these indexes for answering interesting subgraph queries online efficiently. Experimental results on several synthetic datasets and the DBLP and Wikipedia datasets containing thousands of entities show the efficiency and the effectiveness of the proposed approach in computing interesting subgraphs.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2014 IEEE 30th International Conference on Data Engineering

自引率

0.00%

发文量