Components and Rationale of a Big Data Toolkit Spanning HPC, Grid, Edge and Cloud Computing

G. Fox
{"title":"Components and Rationale of a Big Data Toolkit Spanning HPC, Grid, Edge and Cloud Computing","authors":"G. Fox","doi":"10.1145/3147213.3155012","DOIUrl":null,"url":null,"abstract":"We look again at Big Data Programming environments such as Hadoop, Spark, Flink, Heron, Pregel; HPC concepts such as MPI and Asynchronous Many-Task runtimes and Cloud/Grid/Edge ideas such as event-driven computing, serverless computing, workflow, and Services. These cross many research communities including distributed systems, databases, cyberphysical systems and parallel computing which sometimes have inconsistent worldviews. There are many common capabilities across these systems which are often implemented differently in each packaged environment. For example, communication can be bulk synchronous processing or data flow; scheduling can be dynamic or static; state and fault-tolerance can have different models; execution and data can be streaming or batch, distributed or local. We suggest that one can usefully build a toolkit (called Twister2 by us) that supports these different choices and allows fruitful customization for each application area. We illustrate the design of Twister2 by several point studies. We stress the many open questions in very traditional areas including scheduling, messaging and checkpointing.","PeriodicalId":341011,"journal":{"name":"Proceedings of the10th International Conference on Utility and Cloud Computing","volume":"5 5","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the10th International Conference on Utility and Cloud Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3147213.3155012","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

Abstract

We look again at Big Data Programming environments such as Hadoop, Spark, Flink, Heron, Pregel; HPC concepts such as MPI and Asynchronous Many-Task runtimes and Cloud/Grid/Edge ideas such as event-driven computing, serverless computing, workflow, and Services. These cross many research communities including distributed systems, databases, cyberphysical systems and parallel computing which sometimes have inconsistent worldviews. There are many common capabilities across these systems which are often implemented differently in each packaged environment. For example, communication can be bulk synchronous processing or data flow; scheduling can be dynamic or static; state and fault-tolerance can have different models; execution and data can be streaming or batch, distributed or local. We suggest that one can usefully build a toolkit (called Twister2 by us) that supports these different choices and allows fruitful customization for each application area. We illustrate the design of Twister2 by several point studies. We stress the many open questions in very traditional areas including scheduling, messaging and checkpointing.
跨HPC、网格、边缘和云计算的大数据工具箱的组件和基本原理
我们再次审视大数据编程环境,如Hadoop、Spark、Flink、Heron、Pregel;HPC概念,如MPI和异步多任务运行时,以及云/网格/边缘思想,如事件驱动计算、无服务器计算、工作流和服务。它们跨越了许多研究领域,包括分布式系统、数据库、网络物理系统和并行计算,这些领域有时存在不一致的世界观。在这些系统中有许多通用的功能,这些功能在每个打包的环境中通常以不同的方式实现。例如,通信可以批量同步处理或数据流;调度可以是动态的,也可以是静态的;状态和容错可以有不同的模型;执行和数据可以是流式的或批处理的,可以是分布式的或本地的。我们建议可以构建一个有用的工具包(我们称之为Twister2),它支持这些不同的选择,并允许对每个应用程序领域进行富有成效的定制。我们通过几个点的研究来说明Twister2的设计。我们强调许多传统领域的开放问题,包括调度、消息传递和检查点。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信