Jacek Kusnierz, M. Malawski, V. Padulano, E. T. Saavedra, P. Alonso-Jordá
{"title":"基于AWS Lambda的高能物理分布式并行分析引擎","authors":"Jacek Kusnierz, M. Malawski, V. Padulano, E. T. Saavedra, P. Alonso-Jordá","doi":"10.1145/3452413.3464788","DOIUrl":null,"url":null,"abstract":"The High-Energy Physics experiments at CERN produce a high volume of data. It is not possible to analyze big chunks of it within a reasonable time by any single machine. The ROOT framework was recently extended with the distributed computing capabilities for massively parallelized RDataFrame applications. This approach, using the MapReduce pattern underneath, made the heavy computations much more approachable even for the newcomers. This paper explores the possibility of running such analyses on serverless services in public cloud using a purely stateless environment. So far, the distributed approaches used by RDataFrame relied on stateful, fully managed computing frameworks like Apache Spark. Here we show that our newly developed tool is able to use perfectly stateless cloud functions, demonstrating the excellent speedup in parallel stage of processing in our benchmarks.","PeriodicalId":339058,"journal":{"name":"Proceedings of the 1st Workshop on High Performance Serverless Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Distributed Parallel Analysis Engine for High Energy Physics Using AWS Lambda\",\"authors\":\"Jacek Kusnierz, M. Malawski, V. Padulano, E. T. Saavedra, P. Alonso-Jordá\",\"doi\":\"10.1145/3452413.3464788\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The High-Energy Physics experiments at CERN produce a high volume of data. It is not possible to analyze big chunks of it within a reasonable time by any single machine. The ROOT framework was recently extended with the distributed computing capabilities for massively parallelized RDataFrame applications. This approach, using the MapReduce pattern underneath, made the heavy computations much more approachable even for the newcomers. This paper explores the possibility of running such analyses on serverless services in public cloud using a purely stateless environment. So far, the distributed approaches used by RDataFrame relied on stateful, fully managed computing frameworks like Apache Spark. Here we show that our newly developed tool is able to use perfectly stateless cloud functions, demonstrating the excellent speedup in parallel stage of processing in our benchmarks.\",\"PeriodicalId\":339058,\"journal\":{\"name\":\"Proceedings of the 1st Workshop on High Performance Serverless Computing\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-06-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 1st Workshop on High Performance Serverless Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3452413.3464788\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 1st Workshop on High Performance Serverless Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3452413.3464788","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Distributed Parallel Analysis Engine for High Energy Physics Using AWS Lambda
The High-Energy Physics experiments at CERN produce a high volume of data. It is not possible to analyze big chunks of it within a reasonable time by any single machine. The ROOT framework was recently extended with the distributed computing capabilities for massively parallelized RDataFrame applications. This approach, using the MapReduce pattern underneath, made the heavy computations much more approachable even for the newcomers. This paper explores the possibility of running such analyses on serverless services in public cloud using a purely stateless environment. So far, the distributed approaches used by RDataFrame relied on stateful, fully managed computing frameworks like Apache Spark. Here we show that our newly developed tool is able to use perfectly stateless cloud functions, demonstrating the excellent speedup in parallel stage of processing in our benchmarks.