Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data最新文献

OPAvion: mining and visualization in large graphs OPAvion:大型图形的挖掘和可视化

Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data Pub Date : 2012-05-20 DOI: 10.1145/2213836.2213941

L. Akoglu, Duen Horng Chau, U. Kang, Danai Koutra, C. Faloutsos

{"title":"OPAvion: mining and visualization in large graphs","authors":"L. Akoglu, Duen Horng Chau, U. Kang, Danai Koutra, C. Faloutsos","doi":"10.1145/2213836.2213941","DOIUrl":"https://doi.org/10.1145/2213836.2213941","url":null,"abstract":"Given a large graph with millions or billions of nodes and edges, like a who-follows-whom Twitter graph, how do we scalably compute its statistics, summarize its patterns, spot anomalies, visualize and make sense of it? We present OPAvion, a graph mining system that provides a scalable, interactive workflow to accomplish these analysis tasks. OPAvion consists of three modules: (1) The Summarization module (Pegasus) operates off-line on massive, disk-resident graphs and computes graph statistics, like PageRank scores, connected components, degree distribution, triangles, etc.; (2) The Anomaly Detection module (OddBall) uses graph statistics to mine patterns and spot anomalies, such as nodes with many contacts but few interactions with them (possibly telemarketers); (3) The Interactive Visualization module (Apolo) lets users incrementally explore the graph, starting with their chosen nodes or the flagged anomalous nodes; then users can expand to the nodes' vicinities, label them into categories, and thus interactively navigate the interesting parts of the graph. In our demonstration, we invite our audience to interact with OPAvion and try out its core capabilities on the Stack Overflow Q&A graph that describes over 6 million questions and answers among 650K users.","PeriodicalId":212616,"journal":{"name":"Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117264215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 36

PAnG: finding patterns in annotation graphs 在注释图中查找模式

Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data Pub Date : 2012-05-20 DOI: 10.1145/2213836.2213930

Philip Anderson, Andreas Thor, Joseph Benik, L. Raschid, Maria-Esther Vidal

引用次数: 8

A model-based approach to attributed graph clustering 一种基于模型的属性图聚类方法

Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data Pub Date : 2012-05-20 DOI: 10.1145/2213836.2213894

Zhiqiang Xu, Yiping Ke, Yi Wang, Hong Cheng, James Cheng

{"title":"A model-based approach to attributed graph clustering","authors":"Zhiqiang Xu, Yiping Ke, Yi Wang, Hong Cheng, James Cheng","doi":"10.1145/2213836.2213894","DOIUrl":"https://doi.org/10.1145/2213836.2213894","url":null,"abstract":"Graph clustering, also known as community detection, is a long-standing problem in data mining. However, with the proliferation of rich attribute information available for objects in real-world graphs, how to leverage structural and attribute information for clustering attributed graphs becomes a new challenge. Most existing works take a distance-based approach. They proposed various distance measures to combine structural and attribute information. In this paper, we consider an alternative view and propose a model-based approach to attributed graph clustering. We develop a Bayesian probabilistic model for attributed graphs. The model provides a principled and natural framework for capturing both structural and attribute aspects of a graph, while avoiding the artificial design of a distance measure. Clustering with the proposed model can be transformed into a probabilistic inference problem, for which we devise an efficient variational algorithm. Experimental results on large real-world datasets demonstrate that our method significantly outperforms the state-of-art distance-based attributed graph clustering method.","PeriodicalId":212616,"journal":{"name":"Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114214061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 319

Skimmer: rapid scrolling of relational query results 略读器:快速滚动的关系查询结果

Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data Pub Date : 2012-05-20 DOI: 10.1145/2213836.2213858

Manish Singh, Arnab Nandi, H. Jagadish

引用次数: 20

Tiresias: the database oracle for how-to queries Tiresias:用于how-to查询的数据库oracle

Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data Pub Date : 2012-05-20 DOI: 10.1145/2213836.2213875

A. Meliou, Dan Suciu

{"title":"Tiresias: the database oracle for how-to queries","authors":"A. Meliou, Dan Suciu","doi":"10.1145/2213836.2213875","DOIUrl":"https://doi.org/10.1145/2213836.2213875","url":null,"abstract":"How-To queries answer fundamental data analysis questions of the form: \"How should the input change in order to achieve the desired output\". As a Reverse Data Management problem, the evaluation of how-to queries is harder than their \"forward\" counterpart: hypothetical, or what-if queries. In this paper, we present Tiresias, the first system that provides support for how-to queries, allowing the definition and integrated evaluation of a large set of constrained optimization problems, specifically Mixed Integer Programming problems, on top of a relational database system. Tiresias generates the problem variables, constraints and objectives by issuing standard SQL statements, allowing for its integration with any RDBMS. The contributions of this work are the following: (a) we define how-to queries using possible world semantics, and propose the specification language TiQL (for Tiresias Query Language) based on simple extensions to standard Datalog. (b) We define translation rules that generate a Mixed Integer Program (MIP) from TiQL specifications, which can be solved using existing tools. (c) Tiresias implements powerful \"data-aware\" optimizations that are beyond the capabilities of modern MIP solvers, dramatically improving the system performance. (d) Finally, an extensive performance evaluation on the TPC-H dataset demonstrates the effectiveness of these optimizations, particularly highlighting the ability to apply divide-and-conquer methods to break MIP problems into smaller instances.","PeriodicalId":212616,"journal":{"name":"Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data","volume":"109 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124056340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 81

Shark: fast data analysis using coarse-grained distributed memory Shark:使用粗粒度分布式内存的快速数据分析

Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data Pub Date : 2012-05-20 DOI: 10.1145/2213836.2213934

C. Engle, Antonio Lupher, Reynold Xin, M. Zaharia, M. Franklin, S. Shenker, I. Stoica

引用次数: 147

Kaizen: a semi-automatic index advisor Kaizen:半自动索引顾问

Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data Pub Date : 2012-05-20 DOI: 10.1145/2213836.2213932

I. Jimenez, H. Sánchez, Quoc Trung Tran, N. Polyzotis

引用次数: 5

Skeleton automata for FPGAs: reconfiguring without reconstructing fpga的骨架自动机:无需重构即可重新配置

Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data Pub Date : 2012-05-20 DOI: 10.1145/2213836.2213863

J. Teubner, L. Woods, Chongling Nie

引用次数: 34

Logos: a system for translating queries into narratives Logos:一个将查询转化为叙述的系统

Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data Pub Date : 2012-05-20 DOI: 10.1145/2213836.2213929

Andreas Kokkalis, Panagiotis Vagenas, Alexandros Zervakis, A. Simitsis, G. Koutrika, Y. Ioannidis

引用次数: 32

Automatic web-scale information extraction 自动网络规模的信息提取

Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data Pub Date : 2012-05-20 DOI: 10.1145/2213836.2213912

P. Bohannon, Nilesh N. Dalvi, Yuval Filmus, Nori Jacoby, S. Keerthi, Alok Kirpal

引用次数: 23