跨HPC、网格、边缘和云计算的大数据工具箱的组件和基本原理

Proceedings of the10th International Conference on Utility and Cloud Computing Pub Date : 2017-12-05 DOI:10.1145/3147213.3155012

G. Fox

{"title":"跨HPC、网格、边缘和云计算的大数据工具箱的组件和基本原理","authors":"G. Fox","doi":"10.1145/3147213.3155012","DOIUrl":null,"url":null,"abstract":"We look again at Big Data Programming environments such as Hadoop, Spark, Flink, Heron, Pregel; HPC concepts such as MPI and Asynchronous Many-Task runtimes and Cloud/Grid/Edge ideas such as event-driven computing, serverless computing, workflow, and Services. These cross many research communities including distributed systems, databases, cyberphysical systems and parallel computing which sometimes have inconsistent worldviews. There are many common capabilities across these systems which are often implemented differently in each packaged environment. For example, communication can be bulk synchronous processing or data flow; scheduling can be dynamic or static; state and fault-tolerance can have different models; execution and data can be streaming or batch, distributed or local. We suggest that one can usefully build a toolkit (called Twister2 by us) that supports these different choices and allows fruitful customization for each application area. We illustrate the design of Twister2 by several point studies. We stress the many open questions in very traditional areas including scheduling, messaging and checkpointing.","PeriodicalId":341011,"journal":{"name":"Proceedings of the10th International Conference on Utility and Cloud Computing","volume":"5 5","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Components and Rationale of a Big Data Toolkit Spanning HPC, Grid, Edge and Cloud Computing\",\"authors\":\"G. Fox\",\"doi\":\"10.1145/3147213.3155012\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We look again at Big Data Programming environments such as Hadoop, Spark, Flink, Heron, Pregel; HPC concepts such as MPI and Asynchronous Many-Task runtimes and Cloud/Grid/Edge ideas such as event-driven computing, serverless computing, workflow, and Services. These cross many research communities including distributed systems, databases, cyberphysical systems and parallel computing which sometimes have inconsistent worldviews. There are many common capabilities across these systems which are often implemented differently in each packaged environment. For example, communication can be bulk synchronous processing or data flow; scheduling can be dynamic or static; state and fault-tolerance can have different models; execution and data can be streaming or batch, distributed or local. We suggest that one can usefully build a toolkit (called Twister2 by us) that supports these different choices and allows fruitful customization for each application area. We illustrate the design of Twister2 by several point studies. We stress the many open questions in very traditional areas including scheduling, messaging and checkpointing.\",\"PeriodicalId\":341011,\"journal\":{\"name\":\"Proceedings of the10th International Conference on Utility and Cloud Computing\",\"volume\":\"5 5\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-12-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the10th International Conference on Utility and Cloud Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3147213.3155012\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the10th International Conference on Utility and Cloud Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3147213.3155012","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

摘要

我们再次审视大数据编程环境，如Hadoop、Spark、Flink、Heron、Pregel;HPC概念，如MPI和异步多任务运行时，以及云/网格/边缘思想，如事件驱动计算、无服务器计算、工作流和服务。它们跨越了许多研究领域，包括分布式系统、数据库、网络物理系统和并行计算，这些领域有时存在不一致的世界观。在这些系统中有许多通用的功能，这些功能在每个打包的环境中通常以不同的方式实现。例如，通信可以批量同步处理或数据流;调度可以是动态的，也可以是静态的;状态和容错可以有不同的模型;执行和数据可以是流式的或批处理的，可以是分布式的或本地的。我们建议可以构建一个有用的工具包(我们称之为Twister2)，它支持这些不同的选择，并允许对每个应用程序领域进行富有成效的定制。我们通过几个点的研究来说明Twister2的设计。我们强调许多传统领域的开放问题，包括调度、消息传递和检查点。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Components and Rationale of a Big Data Toolkit Spanning HPC, Grid, Edge and Cloud Computing

We look again at Big Data Programming environments such as Hadoop, Spark, Flink, Heron, Pregel; HPC concepts such as MPI and Asynchronous Many-Task runtimes and Cloud/Grid/Edge ideas such as event-driven computing, serverless computing, workflow, and Services. These cross many research communities including distributed systems, databases, cyberphysical systems and parallel computing which sometimes have inconsistent worldviews. There are many common capabilities across these systems which are often implemented differently in each packaged environment. For example, communication can be bulk synchronous processing or data flow; scheduling can be dynamic or static; state and fault-tolerance can have different models; execution and data can be streaming or batch, distributed or local. We suggest that one can usefully build a toolkit (called Twister2 by us) that supports these different choices and allows fruitful customization for each application area. We illustrate the design of Twister2 by several point studies. We stress the many open questions in very traditional areas including scheduling, messaging and checkpointing.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the10th International Conference on Utility and Cloud Computing

自引率

0.00%

发文量