Performance Optimization of Budget-Constrained MapReduce Workflows in Multi-Clouds

Huiyan Cao, C. Wu
{"title":"Performance Optimization of Budget-Constrained MapReduce Workflows in Multi-Clouds","authors":"Huiyan Cao, C. Wu","doi":"10.1109/CCGRID.2018.00039","DOIUrl":null,"url":null,"abstract":"With the rapid deployment of cloud infrastructures around the globe and the economic benefit of cloud-based computing and storage services, an increasing number of scientific workflows have been shifted or are in active transition to clouds. As the scale of scientific applications continues to grow, it is now common to deploy data-and network-intensive computing workflows across multi-clouds, where inter-cloud data transfer has a significant impact on both workflow performance and financial cost. We construct rigorous mathematical models to analyze intra-and inter-cloud execution dynamics of scientific workflows and formulate a budget-constrained workflow mapping problem to optimize the network performance of MapReduce-based scientific workflows in Hadoop systems in multi-cloud environments. We show this problem to be NP-complete and design a heuristic solution that takes into consideration module execution, data transfer, and I/O operations. The performance superiority of the proposed mapping solution over existing methods is illustrated through extensive simulations and further verified by real-life workflow experiments deployed in public clouds. We observe about 15% discrepancy between our theoretical estimates and real-world experimental measurements, which validates the correctness of our cost models and also ensures accurate workflow mapping in real systems.","PeriodicalId":321027,"journal":{"name":"2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCGRID.2018.00039","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

With the rapid deployment of cloud infrastructures around the globe and the economic benefit of cloud-based computing and storage services, an increasing number of scientific workflows have been shifted or are in active transition to clouds. As the scale of scientific applications continues to grow, it is now common to deploy data-and network-intensive computing workflows across multi-clouds, where inter-cloud data transfer has a significant impact on both workflow performance and financial cost. We construct rigorous mathematical models to analyze intra-and inter-cloud execution dynamics of scientific workflows and formulate a budget-constrained workflow mapping problem to optimize the network performance of MapReduce-based scientific workflows in Hadoop systems in multi-cloud environments. We show this problem to be NP-complete and design a heuristic solution that takes into consideration module execution, data transfer, and I/O operations. The performance superiority of the proposed mapping solution over existing methods is illustrated through extensive simulations and further verified by real-life workflow experiments deployed in public clouds. We observe about 15% discrepancy between our theoretical estimates and real-world experimental measurements, which validates the correctness of our cost models and also ensures accurate workflow mapping in real systems.
预算约束MapReduce工作流在多云环境下的性能优化
随着云基础设施在全球范围内的快速部署以及基于云的计算和存储服务的经济效益,越来越多的科学工作流程已经转移或正在积极过渡到云。随着科学应用规模的持续增长,现在跨多云部署数据和网络密集型计算工作流是很常见的,其中云之间的数据传输对工作流性能和财务成本都有重大影响。构建严谨的数学模型,分析科学工作流在云内和云间的执行动态,提出预算约束下的工作流映射问题,优化Hadoop系统中基于mapreduce的科学工作流在多云环境下的网络性能。我们展示了这个问题是np完全的,并设计了一个启发式解决方案,考虑了模块执行、数据传输和I/O操作。通过大量的仿真和部署在公共云上的实际工作流程实验进一步验证了所提出的映射解决方案相对于现有方法的性能优势。我们观察到我们的理论估计与实际实验测量之间约有15%的差异,这验证了我们的成本模型的正确性,也确保了在实际系统中准确的工作流映射。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信