MapReduce program synthesis

Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation Pub Date : 2016-06-02 DOI:10.1145/2908080.2908102

Calvin Smith, Aws Albarghouthi

引用次数: 91

Abstract

By abstracting away the complexity of distributed systems, large-scale data processing platforms—MapReduce, Hadoop, Spark, Dryad, etc.—have provided developers with simple means for harnessing the power of the cloud. In this paper, we ask whether we can automatically synthesize MapReduce-style distributed programs from input–output examples. Our ultimate goal is to enable end users to specify large-scale data analyses through the simple interface of examples. We thus present a new algorithm and tool for synthesizing programs composed of efficient data-parallel operations that can execute on cloud computing infrastructure. We evaluate our tool on a range of real-world big-data analysis tasks and general computations. Our results demonstrate the efficiency of our approach and the small number of examples it requires to synthesize correct, scalable programs.

查看原文本刊更多论文

MapReduce程序合成

通过抽象出分布式系统的复杂性，大规模数据处理平台——mapreduce、Hadoop、Spark、Dryad等——为开发人员提供了利用云计算力量的简单方法。在本文中，我们探讨是否可以从输入输出示例中自动合成mapreduce风格的分布式程序。我们的最终目标是使最终用户能够通过简单的示例界面指定大规模数据分析。因此，我们提出了一种新的算法和工具，用于合成可在云计算基础设施上执行的高效数据并行操作组成的程序。我们通过一系列现实世界的大数据分析任务和一般计算来评估我们的工具。我们的结果证明了我们的方法的效率和少量的例子，它需要合成正确的，可扩展的程序。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation

自引率

0.00%

发文量