Shasta: Interactive Reporting At Scale

Proceedings of the 2016 International Conference on Management of Data Pub Date : 2016-06-14 DOI:10.1145/2882903.2904444

G. Manoharan, Stephan Ellner, Karl Schnaitter, Sridatta Chegu, Alejandro Estrella-Balderrama, Stephan Gudmundson, Apurv Gupta, B. Handy, Bart Samwel, Chad Whipkey, Larysa Aharkava, Himani Apte, Nitin Gangahar, Jun Xu, S. Venkataraman, D. Agrawal, J. Ullman

{"title":"Shasta: Interactive Reporting At Scale","authors":"G. Manoharan, Stephan Ellner, Karl Schnaitter, Sridatta Chegu, Alejandro Estrella-Balderrama, Stephan Gudmundson, Apurv Gupta, B. Handy, Bart Samwel, Chad Whipkey, Larysa Aharkava, Himani Apte, Nitin Gangahar, Jun Xu, S. Venkataraman, D. Agrawal, J. Ullman","doi":"10.1145/2882903.2904444","DOIUrl":null,"url":null,"abstract":"We describe Shasta, a middleware system built at Google to support interactive reporting in complex user-facing applications related to Google's Internet advertising business. Shasta targets applications with challenging requirements: First, user query latencies must be low. Second, underlying transactional data stores have complex \"read-unfriendly\" schemas, placing significant transformation logic between stored data and the read-only views that Shasta exposes to its clients. This transformation logic must be expressed in a way that scales to large and agile engineering teams. Finally, Shasta targets applications with strong data freshness requirements, making it challenging to precompute query results using common techniques such as ETL pipelines or materialized views. Instead, online queries must go all the way from primary storage to user-facing views, resulting in complex queries joining 50 or more tables. Designed as a layer on top of Google's F1 RDBMS and Mesa data warehouse, Shasta combines language and system techniques to meet these requirements. To help with expressing complex view specifications, we developed a query language called RVL, with support for modularized view templates that can be dynamically compiled into SQL. To execute these SQL queries with low latency at scale, we leveraged and extended F1's distributed query engine with facilities such as safe execution of C++ and Java UDFs. To reduce latency and increase read parallelism, we extended F1 storage with a distributed read-only in-memory cache. The system we describe is in production at Google, powering critical applications used by advertisers and internal sales teams. Shasta has significantly improved system scalability and software engineering efficiency compared to the middleware solutions it replaced.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":"31 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2016-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2016 International Conference on Management of Data","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2882903.2904444","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

We describe Shasta, a middleware system built at Google to support interactive reporting in complex user-facing applications related to Google's Internet advertising business. Shasta targets applications with challenging requirements: First, user query latencies must be low. Second, underlying transactional data stores have complex "read-unfriendly" schemas, placing significant transformation logic between stored data and the read-only views that Shasta exposes to its clients. This transformation logic must be expressed in a way that scales to large and agile engineering teams. Finally, Shasta targets applications with strong data freshness requirements, making it challenging to precompute query results using common techniques such as ETL pipelines or materialized views. Instead, online queries must go all the way from primary storage to user-facing views, resulting in complex queries joining 50 or more tables. Designed as a layer on top of Google's F1 RDBMS and Mesa data warehouse, Shasta combines language and system techniques to meet these requirements. To help with expressing complex view specifications, we developed a query language called RVL, with support for modularized view templates that can be dynamically compiled into SQL. To execute these SQL queries with low latency at scale, we leveraged and extended F1's distributed query engine with facilities such as safe execution of C++ and Java UDFs. To reduce latency and increase read parallelism, we extended F1 storage with a distributed read-only in-memory cache. The system we describe is in production at Google, powering critical applications used by advertisers and internal sales teams. Shasta has significantly improved system scalability and software engineering efficiency compared to the middleware solutions it replaced.

查看原文本刊更多论文

沙斯塔:大规模互动报道

我们描述了Shasta，一个在Google建立的中间件系统，用于支持与Google的互联网广告业务相关的复杂的面向用户的应用程序中的交互式报告。Shasta针对具有挑战性需求的应用程序:首先，用户查询延迟必须很低。其次，底层事务性数据存储具有复杂的“读不友好”模式，在存储的数据和Shasta向其客户端公开的只读视图之间放置了重要的转换逻辑。这种转换逻辑必须以一种适用于大型敏捷工程团队的方式来表达。最后，Shasta针对具有强烈数据新鲜度要求的应用程序，这使得使用通用技术(如ETL管道或物化视图)预先计算查询结果具有挑战性。相反，在线查询必须从主存储一直到面向用户的视图，导致复杂的查询连接50个或更多的表。作为Google的F1 RDBMS和Mesa数据仓库之上的一个层，Shasta结合了语言和系统技术来满足这些需求。为了帮助表达复杂的视图规范，我们开发了一种名为RVL的查询语言，支持可以动态编译成SQL的模块化视图模板。为了以低延迟的方式执行这些SQL查询，我们利用并扩展了F1的分布式查询引擎，并提供了安全执行c++和Java udf等功能。为了减少延迟和增加读取并行性，我们使用分布式只读内存缓存扩展了F1存储。我们所描述的系统正在谷歌生产中，为广告商和内部销售团队使用的关键应用程序提供动力。与它所取代的中间件解决方案相比，Shasta显著提高了系统的可伸缩性和软件工程效率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2016 International Conference on Management of Data

自引率

0.00%

发文量