{"title":"Lambda-Blocks: Data Processing with Topologies of Blocks","authors":"Matthieu Caneill, N. D. Palma","doi":"10.1109/BigDataCongress.2018.00009","DOIUrl":null,"url":null,"abstract":"We present and evaluate λ-blocks, a novel framework to write data processing programs in a descriptive manner. The main idea behind this framework is to separate the semantics of a program from its implementation. For that purpose, we define a data schema, able to describe, parameterize, compose, and link together blocks of code, storing a directed graph which represents the data transformations. Along this data schema lies an execution engine, able to read such a program, give feedback on potential errors, and finally execute it. In our reference implementation, a computation graph is described in YAML, linking together vertices of Python code blocks defined in separate libraries. The advantages of this approach are manyfold: faster, less error-prone programming; reuse of code blocks; computation graph manipulations; mixing of different specialized libraries; and finally middleware for potential front-ends (such as graphical interfaces) and back-ends (other execution engines). The main goal of λ-blocks is to bring complex data processing computations to non-specialists, by providing a simple abstraction over large-scale data processing systems. Our contributions lie within a description of the schema, and an analysis of the reference execution engine. For that purpose we describe λ-blocks' internals and its main abstractions (blocks and topologies), and evaluate the framework performances. We measured the framework overhead to have a maximum value of 50 ms, a negligible amount compared to the average duration of data processing jobs.","PeriodicalId":177250,"journal":{"name":"2018 IEEE International Congress on Big Data (BigData Congress)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE International Congress on Big Data (BigData Congress)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BigDataCongress.2018.00009","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
We present and evaluate λ-blocks, a novel framework to write data processing programs in a descriptive manner. The main idea behind this framework is to separate the semantics of a program from its implementation. For that purpose, we define a data schema, able to describe, parameterize, compose, and link together blocks of code, storing a directed graph which represents the data transformations. Along this data schema lies an execution engine, able to read such a program, give feedback on potential errors, and finally execute it. In our reference implementation, a computation graph is described in YAML, linking together vertices of Python code blocks defined in separate libraries. The advantages of this approach are manyfold: faster, less error-prone programming; reuse of code blocks; computation graph manipulations; mixing of different specialized libraries; and finally middleware for potential front-ends (such as graphical interfaces) and back-ends (other execution engines). The main goal of λ-blocks is to bring complex data processing computations to non-specialists, by providing a simple abstraction over large-scale data processing systems. Our contributions lie within a description of the schema, and an analysis of the reference execution engine. For that purpose we describe λ-blocks' internals and its main abstractions (blocks and topologies), and evaluate the framework performances. We measured the framework overhead to have a maximum value of 50 ms, a negligible amount compared to the average duration of data processing jobs.