Lambda-Blocks:基于块拓扑的数据处理

Matthieu Caneill, N. D. Palma
{"title":"Lambda-Blocks:基于块拓扑的数据处理","authors":"Matthieu Caneill, N. D. Palma","doi":"10.1109/BigDataCongress.2018.00009","DOIUrl":null,"url":null,"abstract":"We present and evaluate λ-blocks, a novel framework to write data processing programs in a descriptive manner. The main idea behind this framework is to separate the semantics of a program from its implementation. For that purpose, we define a data schema, able to describe, parameterize, compose, and link together blocks of code, storing a directed graph which represents the data transformations. Along this data schema lies an execution engine, able to read such a program, give feedback on potential errors, and finally execute it. In our reference implementation, a computation graph is described in YAML, linking together vertices of Python code blocks defined in separate libraries. The advantages of this approach are manyfold: faster, less error-prone programming; reuse of code blocks; computation graph manipulations; mixing of different specialized libraries; and finally middleware for potential front-ends (such as graphical interfaces) and back-ends (other execution engines). The main goal of λ-blocks is to bring complex data processing computations to non-specialists, by providing a simple abstraction over large-scale data processing systems. Our contributions lie within a description of the schema, and an analysis of the reference execution engine. For that purpose we describe λ-blocks' internals and its main abstractions (blocks and topologies), and evaluate the framework performances. We measured the framework overhead to have a maximum value of 50 ms, a negligible amount compared to the average duration of data processing jobs.","PeriodicalId":177250,"journal":{"name":"2018 IEEE International Congress on Big Data (BigData Congress)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Lambda-Blocks: Data Processing with Topologies of Blocks\",\"authors\":\"Matthieu Caneill, N. D. Palma\",\"doi\":\"10.1109/BigDataCongress.2018.00009\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We present and evaluate λ-blocks, a novel framework to write data processing programs in a descriptive manner. The main idea behind this framework is to separate the semantics of a program from its implementation. For that purpose, we define a data schema, able to describe, parameterize, compose, and link together blocks of code, storing a directed graph which represents the data transformations. Along this data schema lies an execution engine, able to read such a program, give feedback on potential errors, and finally execute it. In our reference implementation, a computation graph is described in YAML, linking together vertices of Python code blocks defined in separate libraries. The advantages of this approach are manyfold: faster, less error-prone programming; reuse of code blocks; computation graph manipulations; mixing of different specialized libraries; and finally middleware for potential front-ends (such as graphical interfaces) and back-ends (other execution engines). The main goal of λ-blocks is to bring complex data processing computations to non-specialists, by providing a simple abstraction over large-scale data processing systems. Our contributions lie within a description of the schema, and an analysis of the reference execution engine. For that purpose we describe λ-blocks' internals and its main abstractions (blocks and topologies), and evaluate the framework performances. We measured the framework overhead to have a maximum value of 50 ms, a negligible amount compared to the average duration of data processing jobs.\",\"PeriodicalId\":177250,\"journal\":{\"name\":\"2018 IEEE International Congress on Big Data (BigData Congress)\",\"volume\":\"34 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE International Congress on Big Data (BigData Congress)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/BigDataCongress.2018.00009\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE International Congress on Big Data (BigData Congress)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BigDataCongress.2018.00009","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

我们提出并评估了λ块,这是一种以描述性方式编写数据处理程序的新框架。该框架背后的主要思想是将程序的语义与其实现分离开来。为此,我们定义了一个数据模式,它能够描述、参数化、组合和链接代码块,并存储表示数据转换的有向图。在这个数据模式中有一个执行引擎,它能够读取这样的程序,对潜在的错误给出反馈,并最终执行它。在我们的参考实现中,计算图是用YAML描述的,它将在不同库中定义的Python代码块的顶点链接在一起。这种方法的优点是多方面的:更快,更少出错的编程;代码块的重用;计算图操作;混合使用不同的专业库;最后是用于潜在前端(如图形界面)和后端(其他执行引擎)的中间件。λ-blocks的主要目标是通过在大规模数据处理系统上提供一个简单的抽象,将复杂的数据处理计算带给非专业人士。我们的贡献在于对模式的描述和对引用执行引擎的分析。为此,我们描述了λ-blocks的内部结构及其主要抽象(块和拓扑),并评估了框架的性能。我们测量的框架开销最大值为50 ms,与数据处理作业的平均持续时间相比,这个值可以忽略不计。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Lambda-Blocks: Data Processing with Topologies of Blocks
We present and evaluate λ-blocks, a novel framework to write data processing programs in a descriptive manner. The main idea behind this framework is to separate the semantics of a program from its implementation. For that purpose, we define a data schema, able to describe, parameterize, compose, and link together blocks of code, storing a directed graph which represents the data transformations. Along this data schema lies an execution engine, able to read such a program, give feedback on potential errors, and finally execute it. In our reference implementation, a computation graph is described in YAML, linking together vertices of Python code blocks defined in separate libraries. The advantages of this approach are manyfold: faster, less error-prone programming; reuse of code blocks; computation graph manipulations; mixing of different specialized libraries; and finally middleware for potential front-ends (such as graphical interfaces) and back-ends (other execution engines). The main goal of λ-blocks is to bring complex data processing computations to non-specialists, by providing a simple abstraction over large-scale data processing systems. Our contributions lie within a description of the schema, and an analysis of the reference execution engine. For that purpose we describe λ-blocks' internals and its main abstractions (blocks and topologies), and evaluate the framework performances. We measured the framework overhead to have a maximum value of 50 ms, a negligible amount compared to the average duration of data processing jobs.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信