{"title":"基于Spark实现的Web分析查询引擎Spindle的性能研究","authors":"Brandon Amos, David Tompkins","doi":"10.1109/CloudCom.2014.111","DOIUrl":null,"url":null,"abstract":"This paper shares our experiences building and benchmarking Spindle as an open source Spark-based web analytics platform. Spindle's design has been motivated by real-world queries and data requiring concurrent, low latency query execution. We identify a search space of Spark tuning options and study their impact on Spark's performance. Results from a self-hosted six node cluster with one week of analytics data (13.1GB) indicate tuning options such as proper partitioning can cause a 5x performance improvement.","PeriodicalId":249306,"journal":{"name":"2014 IEEE 6th International Conference on Cloud Computing Technology and Science","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Performance Study of Spindle, A Web Analytics Query Engine Implemented in Spark\",\"authors\":\"Brandon Amos, David Tompkins\",\"doi\":\"10.1109/CloudCom.2014.111\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper shares our experiences building and benchmarking Spindle as an open source Spark-based web analytics platform. Spindle's design has been motivated by real-world queries and data requiring concurrent, low latency query execution. We identify a search space of Spark tuning options and study their impact on Spark's performance. Results from a self-hosted six node cluster with one week of analytics data (13.1GB) indicate tuning options such as proper partitioning can cause a 5x performance improvement.\",\"PeriodicalId\":249306,\"journal\":{\"name\":\"2014 IEEE 6th International Conference on Cloud Computing Technology and Science\",\"volume\":\"3 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-12-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 IEEE 6th International Conference on Cloud Computing Technology and Science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CloudCom.2014.111\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE 6th International Conference on Cloud Computing Technology and Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CloudCom.2014.111","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Performance Study of Spindle, A Web Analytics Query Engine Implemented in Spark
This paper shares our experiences building and benchmarking Spindle as an open source Spark-based web analytics platform. Spindle's design has been motivated by real-world queries and data requiring concurrent, low latency query execution. We identify a search space of Spark tuning options and study their impact on Spark's performance. Results from a self-hosted six node cluster with one week of analytics data (13.1GB) indicate tuning options such as proper partitioning can cause a 5x performance improvement.