Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data最新文献_第7页

F1: the fault-tolerant distributed RDBMS supporting google's ad business F1:支持b谷歌广告业务的容错分布式RDBMS

Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data Pub Date : 2012-05-20 DOI: 10.1145/2213836.2213954

J. Shute, Mircea Oancea, Stephan Ellner, B. Handy, Eric Rollins, Bart Samwel, Radek Vingralek, Chad Whipkey, Xin Chen, Beat Jegerlehner, Kyle Littlefield, Phoenix Tong

{"title":"F1: the fault-tolerant distributed RDBMS supporting google's ad business","authors":"J. Shute, Mircea Oancea, Stephan Ellner, B. Handy, Eric Rollins, Bart Samwel, Radek Vingralek, Chad Whipkey, Xin Chen, Beat Jegerlehner, Kyle Littlefield, Phoenix Tong","doi":"10.1145/2213836.2213954","DOIUrl":"https://doi.org/10.1145/2213836.2213954","url":null,"abstract":"Many of the services that are critical to Google's ad business have historically been backed by MySQL. We have recently migrated several of these services to F1, a new RDBMS developed at Google. F1 implements rich relational database features, including a strictly enforced schema, a powerful parallel SQL query engine, general transactions, change tracking and notification, and indexing, and is built on top of a highly distributed storage system that scales on standard hardware in Google data centers. The store is dynamically sharded, supports transactionally-consistent replication across data centers, and is able to handle data center outages without data loss. The strong consistency properties of F1 and its storage system come at the cost of higher write latencies compared to MySQL. Having successfully migrated a rich customer-facing application suite at the heart of Google's ad business to F1, with no downtime, we will describe how we restructured schema and applications to largely hide this increased latency from external users. The distributed nature of F1 also allows it to scale easily and to support significantly higher throughput for batch workloads than a traditional RDBMS. With F1, we have built a novel hybrid system that combines the scalability, fault tolerance, transparent sharding, and cost benefits so far available only in \"NoSQL\" systems with the usability, familiarity, and transactional guarantees expected from an RDBMS.","PeriodicalId":212616,"journal":{"name":"Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127041593","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 60

Probase: a probabilistic taxonomy for text understanding Probase:用于文本理解的概率分类法

Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data Pub Date : 2012-05-20 DOI: 10.1145/2213836.2213891

Wentao Wu, Hongsong Li, Haixun Wang, Kenny Q. Zhu

引用次数: 796

bLSM: a general purpose log structured merge tree bLSM:一个通用的日志结构合并树

Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data Pub Date : 2012-05-20 DOI: 10.1145/2213836.2213862

R. Sears, R. Ramakrishnan

{"title":"bLSM: a general purpose log structured merge tree","authors":"R. Sears, R. Ramakrishnan","doi":"10.1145/2213836.2213862","DOIUrl":"https://doi.org/10.1145/2213836.2213862","url":null,"abstract":"Data management workloads are increasingly write-intensive and subject to strict latency SLAs. This presents a dilemma: Update in place systems have unmatched latency but poor write throughput. In contrast, existing log structured techniques improve write throughput but sacrifice read performance and exhibit unacceptable latency spikes. We begin by presenting a new performance metric: read fanout, and argue that, with read and write amplification, it better characterizes real-world indexes than approaches such as asymptotic analysis and price/performance. We then present bLSM, a Log Structured Merge (LSM) tree with the advantages of B-Trees and log structured approaches: (1) Unlike existing log structured trees, bLSM has near-optimal read and scan performance, and (2) its new \"spring and gear\" merge scheduler bounds write latency without impacting throughput or allowing merges to block writes for extended periods of time. It does this by ensuring merges at each level of the tree make steady progress without resorting to techniques that degrade read performance. We use Bloom filters to improve index performance, and find a number of subtleties arise. First, we ensure reads can stop after finding one version of a record. Otherwise, frequently written items would incur multiple B-Tree lookups. Second, many applications check for existing values at insert. Avoiding the seek performed by the check is crucial.","PeriodicalId":212616,"journal":{"name":"Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131712377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 300

Materialized view selection for XQuery workloads XQuery工作负载的物化视图选择

Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data Pub Date : 2012-05-20 DOI: 10.1145/2213836.2213900

Asterios Katsifodimos, I. Manolescu, V. Vassalos

引用次数: 30

Database techniques for linked data management 链接数据管理的数据库技术

Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data Pub Date : 2012-05-20 DOI: 10.1145/2213836.2213909

A. Harth, K. Hose, Ralf Schenkel

{"title":"Database techniques for linked data management","authors":"A. Harth, K. Hose, Ralf Schenkel","doi":"10.1145/2213836.2213909","DOIUrl":"https://doi.org/10.1145/2213836.2213909","url":null,"abstract":"Linked Data refers to data published in accordance with a number of principles rooted in web standards. In the past few years we have witnessed a tremendous growth in Linked Data publishing on the web, leading to tens of billions of data items published online. Querying the data is a key functionality required to make use of the wealth of rich interlinked data. The goal of the tutorial is to introduce, motivate, and detail techniques for querying heterogeneous structured data from across the web. Our tutorial aims to introduce database researchers and practitioners to the new publishing paradigm on the web, and show how the abundance of data published as Linked Data can serve as fertile ground for database research and experimentation. As such, the tutorial focuses on applying database techniques to processing Linked Data, such as optimized indexing and query processing methods in the centralized setting as well as distributed approaches for querying. At the same time, we make the connection from Linked Data best practices to established technologies in distributed databases and the concept of Dataspaces and show differences as well as commonalities between the fields.","PeriodicalId":212616,"journal":{"name":"Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131665313","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

MCJoin: a memory-constrained join for column-store main-memory databases MCJoin:用于列存储主存数据库的内存受限连接

Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data Pub Date : 2012-05-20 DOI: 10.1145/2213836.2213851

S. Begley, Zhen He, Yi-Ping Phoebe Chen

引用次数: 14

ReStore: reusing results of MapReduce jobs in pig ReStore:在pig中重用MapReduce作业的结果

Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data Pub Date : 2012-05-20 DOI: 10.1145/2213836.2213937

Iman Elghandour, Ashraf Aboulnaga

{"title":"ReStore: reusing results of MapReduce jobs in pig","authors":"Iman Elghandour, Ashraf Aboulnaga","doi":"10.1145/2213836.2213937","DOIUrl":"https://doi.org/10.1145/2213836.2213937","url":null,"abstract":"Analyzing large scale data has become an important activity for many organizations, and is now facilitated by the MapReduce programming and execution model and its implementations, most notably Hadoop. Query languages such as Pig Latin, Hive, and Jaql make it simpler for users to express complex analysis tasks, and the compilers of these languages translate these complex tasks into workflows of MapReduce jobs. Each job in these workflows reads its input from the distributed file system used by the MapReduce system (e.g., HDFS in the case of Hadoop) and produces output that is stored in this distributed file system. This output is then read as input by the next job in the workflow. The current practice is to delete these intermediate results from the distributed file system at the end of executing the workflow. It would be more useful if these intermediate results can be stored and reused in future workflows. We demonstrate ReStore, an extension to Pig that enables it to manage storage and reuse of intermediate results of the MapReduce workflows executed in the Pig data analysis system. ReStore matches input workflows of MapReduce jobs with previously executed jobs and rewrites these workflows to reuse the stored results of the matched jobs. ReStore also creates additional reuse opportunities by materializing and reserving the output of query execution operators that are executed within a MapReduce job. In this demonstration we showcase the MapReduce jobs and sub-jobs recommended by ReStore for a given Pig query, the rewriting of input queries to reuse stored intermediate results, and a what-if analysis of the effectiveness of reusing stored outputs of previously executed jobs.","PeriodicalId":212616,"journal":{"name":"Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data","volume":"42 5-7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123372115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 31

TIRAMOLA: elastic nosql provisioning through a cloud management platform TIRAMOLA:通过云管理平台弹性发放nosql

Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data Pub Date : 2012-05-20 DOI: 10.1145/2213836.2213943

I. Konstantinou, E. Angelou, Dimitrios Tsoumakos, Christina Boumpouka, N. Koziris, S. Sioutas

{"title":"TIRAMOLA: elastic nosql provisioning through a cloud management platform","authors":"I. Konstantinou, E. Angelou, Dimitrios Tsoumakos, Christina Boumpouka, N. Koziris, S. Sioutas","doi":"10.1145/2213836.2213943","DOIUrl":"https://doi.org/10.1145/2213836.2213943","url":null,"abstract":"NoSQL databases focus on analytical processing of large scale datasets, offering increased scalability over commodity hardware. One of their strongest features is elasticity, which allows for fairly portioned premiums and high-quality performance. Yet, the process of adaptive expansion and contraction of resources usually involves a lot of manual effort, often requiring the definition of the conditions for scaling up or down to be provided by the users. To date, there exists no open-source system for automatic resizing of NoSQL clusters. In this demonstration, we present TIRAMOLA, a modular, cloud-enabled framework for monitoring and adaptively resizing NoSQL clusters. Our system incorporates a decision-making module which allows for optimal cluster resize actions in order to maximize any quantifiable reward function provided together with life-long adaptation to workload or infrastructural changes. The audience will be able to initiate HBase clusters of various sizes and apply varying workloads through multiple YCSB clients. The attendees will be able to watch, in real-time, the system perform automatic VM additions and removals as well as how cluster performance metrics change relative to the optimization parameters of their choice.","PeriodicalId":212616,"journal":{"name":"Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122501997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 35

RACE: real-time applications over cloud-edge 竞赛:云边缘的实时应用程序

Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data Pub Date : 2012-05-20 DOI: 10.1145/2213836.2213916

B. Chandramouli, J. Claessens, Suman Nath, I. Santos, Wenchao Zhou

引用次数: 25

PrefDB: bringing preferences closer to the DBMS PrefDB:使首选项更接近DBMS

Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data Pub Date : 2012-05-20 DOI: 10.1145/2213836.2213927

Anastasios Arvanitis, G. Koutrika

引用次数: 9