A Data Placement Strategy for Data-Intensive Scientific Workflows in Cloud

Qing Zhao, Congcong Xiong, Xi Zhao, Ce Yu, Jian Xiao
{"title":"A Data Placement Strategy for Data-Intensive Scientific Workflows in Cloud","authors":"Qing Zhao, Congcong Xiong, Xi Zhao, Ce Yu, Jian Xiao","doi":"10.1109/CCGrid.2015.72","DOIUrl":null,"url":null,"abstract":"With the arrival of cloud computing and Big Data, many scientific applications with large amount of data can be abstracted as scientific workflows and running on a cloud environment. Distributing these datasets intelligently can decrease data transfers efficiently during the workflow's execution. In this paper, we proposed a 2- stage data placement strategy. In the initial stage, we cluster the datasets based on their correlation, and allocate these clusters onto data centers. Compared with existing works, we have incorporated the data size into correlation calculation, and have proposed a new type of data correlation for the intermediate data named \"the first order conduction correlation\". Hence the data transmission cost can be measured more reasonable. In the runtime stage, the re-distribution algorithm can adjust data layout according to the changed factors, and the overhead of re-layout itself has also been measured. Compared with previous work, simulation results show that our proposed strategy can effectively reduce the time consumption of data movements during the workflow execution.","PeriodicalId":6664,"journal":{"name":"2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"11 1","pages":"928-934"},"PeriodicalIF":0.0000,"publicationDate":"2015-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"27","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCGrid.2015.72","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 27

Abstract

With the arrival of cloud computing and Big Data, many scientific applications with large amount of data can be abstracted as scientific workflows and running on a cloud environment. Distributing these datasets intelligently can decrease data transfers efficiently during the workflow's execution. In this paper, we proposed a 2- stage data placement strategy. In the initial stage, we cluster the datasets based on their correlation, and allocate these clusters onto data centers. Compared with existing works, we have incorporated the data size into correlation calculation, and have proposed a new type of data correlation for the intermediate data named "the first order conduction correlation". Hence the data transmission cost can be measured more reasonable. In the runtime stage, the re-distribution algorithm can adjust data layout according to the changed factors, and the overhead of re-layout itself has also been measured. Compared with previous work, simulation results show that our proposed strategy can effectively reduce the time consumption of data movements during the workflow execution.
云中数据密集型科学工作流的数据放置策略
随着云计算和大数据的到来,许多具有大量数据的科学应用可以抽象为科学工作流,并在云环境中运行。智能地分布这些数据集可以有效地减少工作流执行过程中的数据传输。在本文中,我们提出了一个两阶段的数据放置策略。在初始阶段,我们根据数据集的相关性对数据集进行聚类,并将这些聚类分配到数据中心。与已有工作相比,我们将数据大小纳入关联计算,并对中间数据提出了一种新的数据关联,称为“一阶传导关联”。从而可以更合理地衡量数据传输成本。在运行阶段,重新分配算法可以根据变化的因素调整数据布局,并且对重新分配本身的开销也进行了测量。仿真结果表明,该策略可以有效地减少工作流执行过程中数据移动的时间消耗。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信