Distributed Parallel Analysis Engine for High Energy Physics Using AWS Lambda

Proceedings of the 1st Workshop on High Performance Serverless Computing Pub Date : 2020-06-25 DOI:10.1145/3452413.3464788

Jacek Kusnierz, M. Malawski, V. Padulano, E. T. Saavedra, P. Alonso-Jordá

引用次数: 3

Abstract

The High-Energy Physics experiments at CERN produce a high volume of data. It is not possible to analyze big chunks of it within a reasonable time by any single machine. The ROOT framework was recently extended with the distributed computing capabilities for massively parallelized RDataFrame applications. This approach, using the MapReduce pattern underneath, made the heavy computations much more approachable even for the newcomers. This paper explores the possibility of running such analyses on serverless services in public cloud using a purely stateless environment. So far, the distributed approaches used by RDataFrame relied on stateful, fully managed computing frameworks like Apache Spark. Here we show that our newly developed tool is able to use perfectly stateless cloud functions, demonstrating the excellent speedup in parallel stage of processing in our benchmarks.

查看原文本刊更多论文

基于AWS Lambda的高能物理分布式并行分析引擎

欧洲核子研究中心的高能物理实验产生了大量的数据。任何一台机器都不可能在合理的时间内分析大量数据。ROOT框架最近被扩展为大规模并行RDataFrame应用程序的分布式计算能力。这种方法使用了底层的MapReduce模式，使得繁重的计算即使对于新手来说也更容易处理。本文探讨了使用纯无状态环境在公共云中无服务器服务上运行此类分析的可能性。到目前为止，RDataFrame使用的分布式方法依赖于有状态的、完全托管的计算框架，比如Apache Spark。在这里，我们展示了我们新开发的工具能够完美地使用无状态云功能，在我们的基准测试中展示了并行处理阶段的出色加速。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 1st Workshop on High Performance Serverless Computing

自引率

0.00%

发文量