Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data最新文献_第4页

Adaptive optimizations of recursive queries in teradata teradata中递归查询的自适应优化

Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data Pub Date : 2012-05-20 DOI: 10.1145/2213836.2213966

A. Ghazal, Dawit Yimam Seid, A. Crolotte, Mohammed Al-Kateb

{"title":"Adaptive optimizations of recursive queries in teradata","authors":"A. Ghazal, Dawit Yimam Seid, A. Crolotte, Mohammed Al-Kateb","doi":"10.1145/2213836.2213966","DOIUrl":"https://doi.org/10.1145/2213836.2213966","url":null,"abstract":"Recursive queries were introduced as part of ANSI SQL 99 to support processing of hierarchical data typical of air flight schedules, bill-of-materials, data cube dimension hierarchies, and ancestor-descendant information (e.g. XML data stored in relations). Recently, recursive queries have also found extensive use in web data analysis such as social network and click stream data. Teradata implemented recursive queries in V2R6 using static plans whereby a query is executed in multiple iterations, each iteration corresponding to one level of the recursion. Such a static planning strategy may not be optimal since the demographics of intermediate results from recursive iterations often vary to a great extent. Gathering feedback at each iteration could address this problem by providing size estimates to the optimizer which, in turn, can produce an execution plan for the next iteration. However, such a full feedback scheme suffers from lack of pipelining and the inability to exploit global optimizations across the different recursion iterations. In this paper, we propose adaptive optimization techniques that avoid the issues with static as well as full feedback optimization approaches. Our approach employs a mix of multi-iteration pre-planning and dynamic feedback techniques which are generally applicable to any recursive query implementation in an RDBMS. We also validated the effectiveness of our proposed techniques by conducting experiments on a prototype implementation using a real-life social network data from the FriendFeed online blogging service.","PeriodicalId":212616,"journal":{"name":"Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124811849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

TAO: how facebook serves the social graph TAO: facebook是如何提供社交图谱的

Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data Pub Date : 2012-05-20 DOI: 10.1145/2213836.2213957

Venkateshwaran Venkataramani, Zach Amsden, N. Bronson, G. Cabrera, Prasad Chakka, Peter Dimov, Hui Ding, J. Ferris, A. Giardullo, Jeremy Hoon, Sachin Kulkarni, Nathan Lawrence, Mark Marchukov, Dmitri Petrov, Lovro Puzar

引用次数: 59

Oracle in-database hadoop: when mapreduce meets RDBMS Oracle in-database hadoop:当mapreduce满足RDBMS时

Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data Pub Date : 2012-05-20 DOI: 10.1145/2213836.2213955

X. Su, G. Swart

{"title":"Oracle in-database hadoop: when mapreduce meets RDBMS","authors":"X. Su, G. Swart","doi":"10.1145/2213836.2213955","DOIUrl":"https://doi.org/10.1145/2213836.2213955","url":null,"abstract":"Big data is the tar sands of the data world: vast reserves of raw gritty data whose valuable information content can only be extracted at great cost. MapReduce is a popular parallel programming paradigm well suited to the programmatic extraction and analysis of information from these unstructured Big Data reserves. The Apache Hadoop implementation of MapReduce has become an important player in this market due to its ability to exploit large networks of inexpensive servers. The increasing importance of unstructured data has led to the interest in MapReduce and its Apache Hadoop implementation, which has led to the interest of data processing vendors in supporting this programming style. Oracle RDBMS has had support for the MapReduce paradigm for many years through the mechanism of user defined pipelined table functions and aggregation objects. However, such support has not been Hadoop source compatible. Native Hadoop programs needed to be rewritten before becoming usable in this framework. The ability to run Hadoop programs inside the Oracle database provides a versatile solution to database users, allowing them use programming skills they may already possess and to exploit the growing Hadoop eco-system. In this paper, we describe a prototype of Oracle In-Database Hadoop that supports the running of native Hadoop applications written in Java. This implementation executes Hadoop applications using the efficient parallel capabilities of the Oracle database and a subset of the Apache Hadoop infrastructure. This system's target audience includes both SQL and Hadoop users. We discuss the architecture and design, and in particular, demonstrate how MapReduce functionalities are seamlessly integrated within SQL queries. We also share our experience in building such a system within Oracle database and follow-on topics that we think are promising areas for exploration.","PeriodicalId":212616,"journal":{"name":"Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data","volume":"186 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123061145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 32

Test Of Time Award Talk: Executing SQL over Encrypted Data in the Database-Service-Provider Model 时间测试奖演讲:在数据库服务提供者模型中对加密数据执行SQL

Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data Pub Date : 2012-05-20 DOI: 10.1145/2213836.2370917

H. Hacigumus, Balakrishna (Bala) Iyer, Chen Li, S. Mehrotra

引用次数: 3

Amazon dynamoDB: a seamlessly scalable non-relational database service Amazon dynamoDB:一个无缝可伸缩的非关系数据库服务

Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data Pub Date : 2012-05-20 DOI: 10.1145/2213836.2213945

S. Sivasubramanian

引用次数: 150

Towards effective partition management for large graphs 面向大型图的有效分区管理

Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data Pub Date : 2012-05-20 DOI: 10.1145/2213836.2213895

Shengqi Yang, Xifeng Yan, Bo Zong, Arijit Khan

{"title":"Towards effective partition management for large graphs","authors":"Shengqi Yang, Xifeng Yan, Bo Zong, Arijit Khan","doi":"10.1145/2213836.2213895","DOIUrl":"https://doi.org/10.1145/2213836.2213895","url":null,"abstract":"Searching and mining large graphs today is critical to a variety of application domains, ranging from community detection in social networks, to de novo genome sequence assembly. Scalable processing of large graphs requires careful partitioning and distribution of graphs across clusters. In this paper, we investigate the problem of managing large-scale graphs in clusters and study access characteristics of local graph queries such as breadth-first search, random walk, and SPARQL queries, which are popular in real applications. These queries exhibit strong access locality, and therefore require specific data partitioning strategies. In this work, we propose a Self Evolving Distributed Graph Management Environment (Sedge), to minimize inter-machine communication during graph query processing in multiple machines. In order to improve query response time and throughput, Sedge introduces a two-level partition management architecture with complimentary primary partitions and dynamic secondary partitions. These two kinds of partitions are able to adapt in real time to changes in query workload. (Sedge) also includes a set of workload analyzing algorithms whose time complexity is linear or sublinear to graph size. Empirical results show that it significantly improves distributed graph processing on today's commodity clusters.","PeriodicalId":212616,"journal":{"name":"Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129495451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 171

GLADE: big data analytics made easy 格莱德:大数据分析变得容易了

Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data Pub Date : 2012-05-20 DOI: 10.1145/2213836.2213936

Yu Cheng, Chengjie Qin, Florin Rusu

引用次数: 70

Divergent physical design tuning for replicated databases 针对复制数据库的不同物理设计调优

Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data Pub Date : 2012-05-20 DOI: 10.1145/2213836.2213843

M. Consens, Kleoni Ioannidou, J. LeFevre, N. Polyzotis

{"title":"Divergent physical design tuning for replicated databases","authors":"M. Consens, Kleoni Ioannidou, J. LeFevre, N. Polyzotis","doi":"10.1145/2213836.2213843","DOIUrl":"https://doi.org/10.1145/2213836.2213843","url":null,"abstract":"We introduce divergent designs as a novel tuning paradigm for database systems that employ replication. A divergent design installs a different physical configuration (e.g., indexes and materialized views) with each database replica, specializing replicas for different subsets of the workload. At runtime, queries are routed to the subset of the replicas configured to yield the most efficient execution plans. When compared to uniformly designed replicas, divergent replicas can potentially execute their subset of the queries significantly faster, and their physical configurations could be initialized and maintained(updated) in less time. However, the specialization of divergent replicas limits the ability to load-balance the workload at runtime. We formalize the divergent design problem, characterize the properties of good designs, and analyze the complexity of identifying the optimal divergent design. Our paradigm captures the trade-off between load balancing among all n replicas vs. load balancing among m ≤ n specialized replicas. We develop an effective algorithm (leveraging single-node-tuning functionality) to compute good divergent designs for all the points of this trade-off. Experimental results validate the effectiveness of the algorithm and demonstrate that divergent designs can substantially improve workload performance.","PeriodicalId":212616,"journal":{"name":"Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data","volume":"57 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130617191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 21

From x100 to vectorwise: opportunities, challenges and things most researchers do not think about 从x100到矢量:机遇、挑战和大多数研究人员没有考虑到的事情

Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data Pub Date : 2012-05-20 DOI: 10.1145/2213836.2213967

M. Zukowski, P. Boncz

引用次数: 11

Temporal alignment 时间对齐

Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data Pub Date : 2012-05-20 DOI: 10.1145/2213836.2213886

Anton Dignös, Michael H. Böhlen, J. Gamper

引用次数: 60