在Apache Spark中实现动态SQL编译

Companion Proceedings of the 4th International Conference on Art, Science, and Engineering of Programming Pub Date : 2020-03-23 DOI:10.1145/3397537.3397566

F. Schiavio, Daniele Bonetta, Walter Binder

{"title":"在Apache Spark中实现动态SQL编译","authors":"F. Schiavio, Daniele Bonetta, Walter Binder","doi":"10.1145/3397537.3397566","DOIUrl":null,"url":null,"abstract":"Big-data systems have gained significant momentum, and Apache Spark is becoming a de-facto standard for modern data analytics. Spark relies on code generation to optimize the execution performance of SQL queries on a variety of data sources. Despite its already efficient runtime, Spark's code generation suffers from significant runtime overheads related to data de-serialization during query execution. Such performance penalty can be significant, especially when applications operate on human-readable data formats such as CSV or JSON.","PeriodicalId":373173,"journal":{"name":"Companion Proceedings of the 4th International Conference on Art, Science, and Engineering of Programming","volume":"39 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Towards dynamic SQL compilation in Apache Spark\",\"authors\":\"F. Schiavio, Daniele Bonetta, Walter Binder\",\"doi\":\"10.1145/3397537.3397566\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Big-data systems have gained significant momentum, and Apache Spark is becoming a de-facto standard for modern data analytics. Spark relies on code generation to optimize the execution performance of SQL queries on a variety of data sources. Despite its already efficient runtime, Spark's code generation suffers from significant runtime overheads related to data de-serialization during query execution. Such performance penalty can be significant, especially when applications operate on human-readable data formats such as CSV or JSON.\",\"PeriodicalId\":373173,\"journal\":{\"name\":\"Companion Proceedings of the 4th International Conference on Art, Science, and Engineering of Programming\",\"volume\":\"39 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-03-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Companion Proceedings of the 4th International Conference on Art, Science, and Engineering of Programming\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3397537.3397566\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Companion Proceedings of the 4th International Conference on Art, Science, and Engineering of Programming","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3397537.3397566","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

大数据系统已经获得了巨大的发展势头，Apache Spark正在成为现代数据分析的事实上的标准。Spark依靠代码生成来优化SQL查询在各种数据源上的执行性能。尽管Spark的运行时已经很高效了，但是在查询执行期间，与数据反序列化相关的运行时开销还是很大。这种性能损失可能很严重，特别是当应用程序操作人类可读的数据格式(如CSV或JSON)时。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Towards dynamic SQL compilation in Apache Spark

Big-data systems have gained significant momentum, and Apache Spark is becoming a de-facto standard for modern data analytics. Spark relies on code generation to optimize the execution performance of SQL queries on a variety of data sources. Despite its already efficient runtime, Spark's code generation suffers from significant runtime overheads related to data de-serialization during query execution. Such performance penalty can be significant, especially when applications operate on human-readable data formats such as CSV or JSON.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Companion Proceedings of the 4th International Conference on Art, Science, and Engineering of Programming

自引率

0.00%

发文量