Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data最新文献

筛选
英文 中文
OPAvion: mining and visualization in large graphs OPAvion:大型图形的挖掘和可视化
L. Akoglu, Duen Horng Chau, U. Kang, Danai Koutra, C. Faloutsos
{"title":"OPAvion: mining and visualization in large graphs","authors":"L. Akoglu, Duen Horng Chau, U. Kang, Danai Koutra, C. Faloutsos","doi":"10.1145/2213836.2213941","DOIUrl":"https://doi.org/10.1145/2213836.2213941","url":null,"abstract":"Given a large graph with millions or billions of nodes and edges, like a who-follows-whom Twitter graph, how do we scalably compute its statistics, summarize its patterns, spot anomalies, visualize and make sense of it? We present OPAvion, a graph mining system that provides a scalable, interactive workflow to accomplish these analysis tasks. OPAvion consists of three modules: (1) The Summarization module (Pegasus) operates off-line on massive, disk-resident graphs and computes graph statistics, like PageRank scores, connected components, degree distribution, triangles, etc.; (2) The Anomaly Detection module (OddBall) uses graph statistics to mine patterns and spot anomalies, such as nodes with many contacts but few interactions with them (possibly telemarketers); (3) The Interactive Visualization module (Apolo) lets users incrementally explore the graph, starting with their chosen nodes or the flagged anomalous nodes; then users can expand to the nodes' vicinities, label them into categories, and thus interactively navigate the interesting parts of the graph. In our demonstration, we invite our audience to interact with OPAvion and try out its core capabilities on the Stack Overflow Q&A graph that describes over 6 million questions and answers among 650K users.","PeriodicalId":212616,"journal":{"name":"Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117264215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 36
PAnG: finding patterns in annotation graphs 在注释图中查找模式
Philip Anderson, Andreas Thor, Joseph Benik, L. Raschid, Maria-Esther Vidal
{"title":"PAnG: finding patterns in annotation graphs","authors":"Philip Anderson, Andreas Thor, Joseph Benik, L. Raschid, Maria-Esther Vidal","doi":"10.1145/2213836.2213930","DOIUrl":"https://doi.org/10.1145/2213836.2213930","url":null,"abstract":"Annotation graph datasets are a natural representation of scientific knowledge. They are common in the life sciences and health sciences, where concepts such as genes, proteins or clinical trials are annotated with controlled vocabulary terms from ontologies. We present a tool, PAnG (Patterns in Annotation Graphs), that is based on a complementary methodology of graph summarization and dense subgraphs. The elements of a graph summary correspond to a pattern and its visualization can provide an explanation of the underlying knowledge. Scientists can use PAnG to develop hypotheses and for exploration.","PeriodicalId":212616,"journal":{"name":"Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130491919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
A model-based approach to attributed graph clustering 一种基于模型的属性图聚类方法
Zhiqiang Xu, Yiping Ke, Yi Wang, Hong Cheng, James Cheng
{"title":"A model-based approach to attributed graph clustering","authors":"Zhiqiang Xu, Yiping Ke, Yi Wang, Hong Cheng, James Cheng","doi":"10.1145/2213836.2213894","DOIUrl":"https://doi.org/10.1145/2213836.2213894","url":null,"abstract":"Graph clustering, also known as community detection, is a long-standing problem in data mining. However, with the proliferation of rich attribute information available for objects in real-world graphs, how to leverage structural and attribute information for clustering attributed graphs becomes a new challenge. Most existing works take a distance-based approach. They proposed various distance measures to combine structural and attribute information. In this paper, we consider an alternative view and propose a model-based approach to attributed graph clustering. We develop a Bayesian probabilistic model for attributed graphs. The model provides a principled and natural framework for capturing both structural and attribute aspects of a graph, while avoiding the artificial design of a distance measure. Clustering with the proposed model can be transformed into a probabilistic inference problem, for which we devise an efficient variational algorithm. Experimental results on large real-world datasets demonstrate that our method significantly outperforms the state-of-art distance-based attributed graph clustering method.","PeriodicalId":212616,"journal":{"name":"Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114214061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 319
Skimmer: rapid scrolling of relational query results 略读器:快速滚动的关系查询结果
Manish Singh, Arnab Nandi, H. Jagadish
{"title":"Skimmer: rapid scrolling of relational query results","authors":"Manish Singh, Arnab Nandi, H. Jagadish","doi":"10.1145/2213836.2213858","DOIUrl":"https://doi.org/10.1145/2213836.2213858","url":null,"abstract":"A relational database often yields a large set of tuples as the result of a query. Users browse this result set to find the information they require. If the result set is large, there may be many pages of data to browse. Since results comprise tuples of alphanumeric values that have few visual markers, it is hard to browse the data quickly, even if it is sorted. In this paper, we describe the design of a system for browsing relational data by scrolling through it at a high speed. Rather than showing the user a fast changing blur, the system presents the user with a small number of representative tuples. Representative tuples are selected to provide a \"good impression\" of the query result. We show that the information loss to the user is limited, even at high scrolling speeds, and that our algorithms can pick good representatives fast enough to provide for real-time, high-speed scrolling over large datasets.","PeriodicalId":212616,"journal":{"name":"Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124532847","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Tiresias: the database oracle for how-to queries Tiresias:用于how-to查询的数据库oracle
A. Meliou, Dan Suciu
{"title":"Tiresias: the database oracle for how-to queries","authors":"A. Meliou, Dan Suciu","doi":"10.1145/2213836.2213875","DOIUrl":"https://doi.org/10.1145/2213836.2213875","url":null,"abstract":"How-To queries answer fundamental data analysis questions of the form: \"How should the input change in order to achieve the desired output\". As a Reverse Data Management problem, the evaluation of how-to queries is harder than their \"forward\" counterpart: hypothetical, or what-if queries. In this paper, we present Tiresias, the first system that provides support for how-to queries, allowing the definition and integrated evaluation of a large set of constrained optimization problems, specifically Mixed Integer Programming problems, on top of a relational database system. Tiresias generates the problem variables, constraints and objectives by issuing standard SQL statements, allowing for its integration with any RDBMS. The contributions of this work are the following: (a) we define how-to queries using possible world semantics, and propose the specification language TiQL (for Tiresias Query Language) based on simple extensions to standard Datalog. (b) We define translation rules that generate a Mixed Integer Program (MIP) from TiQL specifications, which can be solved using existing tools. (c) Tiresias implements powerful \"data-aware\" optimizations that are beyond the capabilities of modern MIP solvers, dramatically improving the system performance. (d) Finally, an extensive performance evaluation on the TPC-H dataset demonstrates the effectiveness of these optimizations, particularly highlighting the ability to apply divide-and-conquer methods to break MIP problems into smaller instances.","PeriodicalId":212616,"journal":{"name":"Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data","volume":"109 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124056340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 81
Shark: fast data analysis using coarse-grained distributed memory Shark:使用粗粒度分布式内存的快速数据分析
C. Engle, Antonio Lupher, Reynold Xin, M. Zaharia, M. Franklin, S. Shenker, I. Stoica
{"title":"Shark: fast data analysis using coarse-grained distributed memory","authors":"C. Engle, Antonio Lupher, Reynold Xin, M. Zaharia, M. Franklin, S. Shenker, I. Stoica","doi":"10.1145/2213836.2213934","DOIUrl":"https://doi.org/10.1145/2213836.2213934","url":null,"abstract":"Shark is a research data analysis system built on a novel coarse-grained distributed shared-memory abstraction. Shark marries query processing with deep data analysis, providing a unified system for easy data manipulation using SQL and pushing sophisticated analysis closer to data. It scales to thousands of nodes in a fault-tolerant manner. Shark can answer queries 40X faster than Apache Hive and run machine learning programs 25X faster than MapReduce programs in Apache Hadoop on large datasets.","PeriodicalId":212616,"journal":{"name":"Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127751242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 147
Kaizen: a semi-automatic index advisor Kaizen:半自动索引顾问
I. Jimenez, H. Sánchez, Quoc Trung Tran, N. Polyzotis
{"title":"Kaizen: a semi-automatic index advisor","authors":"I. Jimenez, H. Sánchez, Quoc Trung Tran, N. Polyzotis","doi":"10.1145/2213836.2213932","DOIUrl":"https://doi.org/10.1145/2213836.2213932","url":null,"abstract":"Index tuning; i.e., selecting indexes that are appropriate for the workload to obtain good system performance, is a crucial task for database administrators. Administrators rely on automated index advisors for this task, but existing advisors work either offline, requiring a-priori knowledge of the workload, or online, taking the administrator out of the picture and assuming total control of the index tuning task. Semi-automatic index tuning is a new paradigm that achieves a middle ground: the advisor analyzes the workload online and provides recommendations tailored to the current workload, and the administrator is able to provide feedback to refine future recommendations. In this demonstration we present Kaizen, an index tuning tool that implements semi-automatic tuning.","PeriodicalId":212616,"journal":{"name":"Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126267015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Skeleton automata for FPGAs: reconfiguring without reconstructing fpga的骨架自动机:无需重构即可重新配置
J. Teubner, L. Woods, Chongling Nie
{"title":"Skeleton automata for FPGAs: reconfiguring without reconstructing","authors":"J. Teubner, L. Woods, Chongling Nie","doi":"10.1145/2213836.2213863","DOIUrl":"https://doi.org/10.1145/2213836.2213863","url":null,"abstract":"While the performance opportunities of field-programmable gate arrays field (FPGAs)field for high-volume query processing are well-known, system makers still have to compromise between desired query expressiveness and high compilation effort. The cost of the latter is the primary limitation in building efficient FPGA/CPU hybrids. In this work we report on an FPGA-based stream processing engine that does not have this limitation. We provide a hardware implementation of XML projection [14] that can be reconfigured in less than a micro-second, yet supports a rich and expressive dialect of XPath. By performing XML projection in the network, we can fully leverage its filtering effect and improve XQuery performance by several factors. These improvements are made possible by a new design approach for FPGA acceleration, called skeleton automata. Skeleton automata separate the structure of finite-state automata from their semantics. Since individual queries only affect the latter, with our approach query workload changes can be accommodated fast and with high expressiveness.","PeriodicalId":212616,"journal":{"name":"Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128014865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 34
Logos: a system for translating queries into narratives Logos:一个将查询转化为叙述的系统
Andreas Kokkalis, Panagiotis Vagenas, Alexandros Zervakis, A. Simitsis, G. Koutrika, Y. Ioannidis
{"title":"Logos: a system for translating queries into narratives","authors":"Andreas Kokkalis, Panagiotis Vagenas, Alexandros Zervakis, A. Simitsis, G. Koutrika, Y. Ioannidis","doi":"10.1145/2213836.2213929","DOIUrl":"https://doi.org/10.1145/2213836.2213929","url":null,"abstract":"This paper presents Logos, a system that provides natural language translations for relational queries expressed in SQL. Our translation mechanism is based on a graph-based approach to the query translation problem. We represent various forms of structured queries as directed graphs and we annotate the graph edges with template labels using an extensible template mechanism. Logos uses different graph traversal strategies for efficiently exploring these graphs and composing textual query descriptions. The audience may interactively explore Logos using various database schemata and issuing either sample or ad hoc queries.","PeriodicalId":212616,"journal":{"name":"Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125673565","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 32
Automatic web-scale information extraction 自动网络规模的信息提取
P. Bohannon, Nilesh N. Dalvi, Yuval Filmus, Nori Jacoby, S. Keerthi, Alok Kirpal
{"title":"Automatic web-scale information extraction","authors":"P. Bohannon, Nilesh N. Dalvi, Yuval Filmus, Nori Jacoby, S. Keerthi, Alok Kirpal","doi":"10.1145/2213836.2213912","DOIUrl":"https://doi.org/10.1145/2213836.2213912","url":null,"abstract":"In this demonstration, we showcase the technologies that we are building at Yahoo! for Web-scale Information Extraction. Given any new Website, containing semi-structured information about a pre-specified set of schemas, we show how to populate objects in the corresponding schema by automatically extracting information from the Website.","PeriodicalId":212616,"journal":{"name":"Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131012188","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信