FACADE: A Compiler and Runtime for (Almost) Object-Bounded Big Data Applications

Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems Pub Date : 2015-03-14 DOI:10.1145/2694344.2694345

Khanh Nguyen, Kai Wang, Yingyi Bu, Lu Fang, Jianfei Hu, G. Xu

{"title":"FACADE: A Compiler and Runtime for (Almost) Object-Bounded Big Data Applications","authors":"Khanh Nguyen, Kai Wang, Yingyi Bu, Lu Fang, Jianfei Hu, G. Xu","doi":"10.1145/2694344.2694345","DOIUrl":null,"url":null,"abstract":"The past decade has witnessed the increasing demands on data-driven business intelligence that led to the proliferation of data-intensive applications. A managed object-oriented programming language such as Java is often the developer's choice for implementing such applications, due to its quick development cycle and rich community resource. While the use of such languages makes programming easier, their automated memory management comes at a cost. When the managed runtime meets Big Data, this cost is significantly magnified and becomes a scalability-prohibiting bottleneck. This paper presents a novel compiler framework, called Facade, that can generate highly-efficient data manipulation code by automatically transforming the data path of an existing Big Data application. The key treatment is that in the generated code, the number of runtime heap objects created for data types in each thread is (almost) statically bounded, leading to significantly reduced memory management cost and improved scalability. We have implemented Facade and used it to transform 7 common applications on 3 real-world, already well-optimized Big Data frameworks: GraphChi, Hyracks, and GPS. Our experimental results are very positive: the generated programs have (1) achieved a 3%--48% execution time reduction and an up to 88X GC reduction; (2) consumed up to 50% less memory, and (3) scaled to much larger datasets.","PeriodicalId":403247,"journal":{"name":"Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems","volume":"159 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"100","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2694344.2694345","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 100

Abstract

The past decade has witnessed the increasing demands on data-driven business intelligence that led to the proliferation of data-intensive applications. A managed object-oriented programming language such as Java is often the developer's choice for implementing such applications, due to its quick development cycle and rich community resource. While the use of such languages makes programming easier, their automated memory management comes at a cost. When the managed runtime meets Big Data, this cost is significantly magnified and becomes a scalability-prohibiting bottleneck. This paper presents a novel compiler framework, called Facade, that can generate highly-efficient data manipulation code by automatically transforming the data path of an existing Big Data application. The key treatment is that in the generated code, the number of runtime heap objects created for data types in each thread is (almost) statically bounded, leading to significantly reduced memory management cost and improved scalability. We have implemented Facade and used it to transform 7 common applications on 3 real-world, already well-optimized Big Data frameworks: GraphChi, Hyracks, and GPS. Our experimental results are very positive: the generated programs have (1) achieved a 3%--48% execution time reduction and an up to 88X GC reduction; (2) consumed up to 50% less memory, and (3) scaled to much larger datasets.

查看原文本刊更多论文

FACADE:面向(几乎)对象绑定大数据应用的编译器和运行时

在过去的十年中，对数据驱动的商业智能的需求不断增长，导致了数据密集型应用程序的激增。像Java这样的托管面向对象编程语言通常是开发人员实现这类应用程序的首选，因为它的开发周期快，社区资源丰富。虽然使用这些语言使编程更容易，但它们的自动内存管理是有代价的。当托管运行时遇到大数据时，这一成本将被显著放大，并成为限制可扩展性的瓶颈。本文提出了一种名为Facade的新型编译器框架，它可以通过自动转换现有大数据应用程序的数据路径来生成高效的数据操作代码。关键的处理方法是，在生成的代码中，为每个线程中的数据类型创建的运行时堆对象的数量(几乎)是静态限定的，从而显著降低了内存管理成本并提高了可伸缩性。我们已经实现了Facade，并使用它在3个现实世界中已经优化好的大数据框架(GraphChi、Hyracks和GPS)上改造了7个常见的应用程序。我们的实验结果是非常积极的:生成的程序(1)实现了3%- 48%的执行时间减少和高达88X的GC减少;(2)消耗的内存最多减少50%，(3)扩展到更大的数据集。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems

自引率

0.00%

发文量