Spark Streaming在Apache YARN上的可配置和可执行模型

Jia-Chun Lin, Ming-Chang Lee, Ingrid Chieh Yu, E. Johnsen
{"title":"Spark Streaming在Apache YARN上的可配置和可执行模型","authors":"Jia-Chun Lin, Ming-Chang Lee, Ingrid Chieh Yu, E. Johnsen","doi":"10.1504/ijguc.2020.10026548","DOIUrl":null,"url":null,"abstract":"Streams of data are produced today at an unprecedented scale. Efficient and stable processing of these streams requires a careful interplay between the parameters of the streaming application and of the underlying stream processing framework. Today, finding these parameters happens by trial and error on the complex, deployed framework. This paper shows that high-level models can help to determine these parameters by predicting and comparing the performance of streaming applications running on stream processing frameworks with different configurations. To demonstrate this approach, this paper considers Spark Streaming, a widely used framework to leverage data streams on the fly and provide real-time stream processing. Technically, we develop a configurable and executable model to simulate both the streaming applications and the underlying Spark stream processing framework. Furthermore, we model the deployment of Spark Streaming on Apache YARN, which is a popular open-source distributed software framework for big data processing. We show that the developed model provides a satisfactory accuracy for predicting performance by means of empirical validation.","PeriodicalId":375871,"journal":{"name":"Int. J. Grid Util. Comput.","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2020-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"A configurable and executable model of Spark Streaming on Apache YARN\",\"authors\":\"Jia-Chun Lin, Ming-Chang Lee, Ingrid Chieh Yu, E. Johnsen\",\"doi\":\"10.1504/ijguc.2020.10026548\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Streams of data are produced today at an unprecedented scale. Efficient and stable processing of these streams requires a careful interplay between the parameters of the streaming application and of the underlying stream processing framework. Today, finding these parameters happens by trial and error on the complex, deployed framework. This paper shows that high-level models can help to determine these parameters by predicting and comparing the performance of streaming applications running on stream processing frameworks with different configurations. To demonstrate this approach, this paper considers Spark Streaming, a widely used framework to leverage data streams on the fly and provide real-time stream processing. Technically, we develop a configurable and executable model to simulate both the streaming applications and the underlying Spark stream processing framework. Furthermore, we model the deployment of Spark Streaming on Apache YARN, which is a popular open-source distributed software framework for big data processing. We show that the developed model provides a satisfactory accuracy for predicting performance by means of empirical validation.\",\"PeriodicalId\":375871,\"journal\":{\"name\":\"Int. J. Grid Util. Comput.\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-02-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Int. J. Grid Util. Comput.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1504/ijguc.2020.10026548\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. J. Grid Util. Comput.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1504/ijguc.2020.10026548","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8

摘要

今天,数据流以前所未有的规模产生。高效和稳定地处理这些流需要在流应用程序和底层流处理框架的参数之间进行仔细的相互作用。如今,要找到这些参数,需要在复杂的已部署框架上反复试验。本文表明,高级模型可以通过预测和比较在不同配置的流处理框架上运行的流应用程序的性能来帮助确定这些参数。为了演示这种方法,本文考虑了Spark Streaming,这是一个广泛使用的框架,用于动态利用数据流并提供实时流处理。在技术上,我们开发了一个可配置和可执行的模型来模拟流应用程序和底层Spark流处理框架。此外,我们还对Spark Streaming在Apache YARN上的部署进行了建模,Apache YARN是一个流行的大数据处理开源分布式软件框架。通过实证验证,表明所建立的模型具有较好的预测精度。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A configurable and executable model of Spark Streaming on Apache YARN
Streams of data are produced today at an unprecedented scale. Efficient and stable processing of these streams requires a careful interplay between the parameters of the streaming application and of the underlying stream processing framework. Today, finding these parameters happens by trial and error on the complex, deployed framework. This paper shows that high-level models can help to determine these parameters by predicting and comparing the performance of streaming applications running on stream processing frameworks with different configurations. To demonstrate this approach, this paper considers Spark Streaming, a widely used framework to leverage data streams on the fly and provide real-time stream processing. Technically, we develop a configurable and executable model to simulate both the streaming applications and the underlying Spark stream processing framework. Furthermore, we model the deployment of Spark Streaming on Apache YARN, which is a popular open-source distributed software framework for big data processing. We show that the developed model provides a satisfactory accuracy for predicting performance by means of empirical validation.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信