Pado:在数据中心中利用瞬时资源的数据处理引擎

Proceedings of the Twelfth European Conference on Computer Systems Pub Date : 2017-04-23 DOI:10.1145/3064176.3064181

Youngseok Yang, Geon‐Woo Kim, Won Wook Song, Yunseong Lee, Andrew Chung, Zhengping Qian, Brian Cho, Byung-Gon Chun

{"title":"Pado:在数据中心中利用瞬时资源的数据处理引擎","authors":"Youngseok Yang, Geon‐Woo Kim, Won Wook Song, Yunseong Lee, Andrew Chung, Zhengping Qian, Brian Cho, Byung-Gon Chun","doi":"10.1145/3064176.3064181","DOIUrl":null,"url":null,"abstract":"Datacenters are under-utilized, primarily due to unused resources on over-provisioned nodes of latency-critical jobs. Such idle resources can be used to run batch data analytic jobs to increase datacenter utilization, but these transient resources must be evicted whenever latency-critical jobs require them again. Resource evictions often lead to cascading recomputations, which is usually handled by checkpointing intermediate results on stable storages of eviction-free reserved resources. However, checkpointing has major shortcomings in its substantial overhead of transferring data back and forth. In this work, we step away from such approaches and focus on observing the job structure and the relationships between computations of the job. We carefully mark the computations that are most likely to cause a large number of recomputations upon evictions, to run them reliably using reserved resources. This lets us retain corresponding intermediate results effortlessly without any additional checkpointing. We design Pado, a general data processing engine, which carries out our idea with several optimizations that minimize the number of additional reserved nodes. Evaluation results show that Pado outperforms Spark 2.0.0 by up to 5.1×, and checkpoint-enabled Spark by up to 3.8×.","PeriodicalId":262089,"journal":{"name":"Proceedings of the Twelfth European Conference on Computer Systems","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"48","resultStr":"{\"title\":\"Pado: A Data Processing Engine for Harnessing Transient Resources in Datacenters\",\"authors\":\"Youngseok Yang, Geon‐Woo Kim, Won Wook Song, Yunseong Lee, Andrew Chung, Zhengping Qian, Brian Cho, Byung-Gon Chun\",\"doi\":\"10.1145/3064176.3064181\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Datacenters are under-utilized, primarily due to unused resources on over-provisioned nodes of latency-critical jobs. Such idle resources can be used to run batch data analytic jobs to increase datacenter utilization, but these transient resources must be evicted whenever latency-critical jobs require them again. Resource evictions often lead to cascading recomputations, which is usually handled by checkpointing intermediate results on stable storages of eviction-free reserved resources. However, checkpointing has major shortcomings in its substantial overhead of transferring data back and forth. In this work, we step away from such approaches and focus on observing the job structure and the relationships between computations of the job. We carefully mark the computations that are most likely to cause a large number of recomputations upon evictions, to run them reliably using reserved resources. This lets us retain corresponding intermediate results effortlessly without any additional checkpointing. We design Pado, a general data processing engine, which carries out our idea with several optimizations that minimize the number of additional reserved nodes. Evaluation results show that Pado outperforms Spark 2.0.0 by up to 5.1×, and checkpoint-enabled Spark by up to 3.8×.\",\"PeriodicalId\":262089,\"journal\":{\"name\":\"Proceedings of the Twelfth European Conference on Computer Systems\",\"volume\":\"15 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-04-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"48\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the Twelfth European Conference on Computer Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3064176.3064181\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Twelfth European Conference on Computer Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3064176.3064181","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 48

摘要

数据中心未得到充分利用，主要原因是延迟关键型作业的过度配置节点上未使用资源。这些空闲资源可用于运行批处理数据分析作业，以提高数据中心的利用率，但是当延迟关键型作业再次需要这些临时资源时，必须将它们驱逐出去。资源驱逐通常会导致级联重计算，这通常是通过在无驱逐保留资源的稳定存储上检查中间结果来处理的。然而，检查点的主要缺点在于来回传输数据的大量开销。在这项工作中，我们远离这些方法，专注于观察工作结构和工作计算之间的关系。我们仔细地标记了最有可能在驱逐时导致大量重新计算的计算，以便使用预留资源可靠地运行它们。这让我们可以毫不费力地保留相应的中间结果，而无需任何额外的检查点。我们设计了Pado，一个通用的数据处理引擎，它通过几个优化来实现我们的想法，这些优化最小化了额外保留节点的数量。评估结果表明，Pado比Spark 2.0.0性能高5.1倍，比启用检查点的Spark性能高3.8倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Pado: A Data Processing Engine for Harnessing Transient Resources in Datacenters

Datacenters are under-utilized, primarily due to unused resources on over-provisioned nodes of latency-critical jobs. Such idle resources can be used to run batch data analytic jobs to increase datacenter utilization, but these transient resources must be evicted whenever latency-critical jobs require them again. Resource evictions often lead to cascading recomputations, which is usually handled by checkpointing intermediate results on stable storages of eviction-free reserved resources. However, checkpointing has major shortcomings in its substantial overhead of transferring data back and forth. In this work, we step away from such approaches and focus on observing the job structure and the relationships between computations of the job. We carefully mark the computations that are most likely to cause a large number of recomputations upon evictions, to run them reliably using reserved resources. This lets us retain corresponding intermediate results effortlessly without any additional checkpointing. We design Pado, a general data processing engine, which carries out our idea with several optimizations that minimize the number of additional reserved nodes. Evaluation results show that Pado outperforms Spark 2.0.0 by up to 5.1×, and checkpoint-enabled Spark by up to 3.8×.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the Twelfth European Conference on Computer Systems

自引率

0.00%

发文量