云计算环境下科学工作流中的自适应数据放置策略

Heewon Kim, Yoonhee Kim
{"title":"云计算环境下科学工作流中的自适应数据放置策略","authors":"Heewon Kim, Yoonhee Kim","doi":"10.1109/NOMS.2018.8406191","DOIUrl":null,"url":null,"abstract":"Data of scientific workflow applications are tend to be distributed over many data centers to be effectively stored, retrieved, and transferred among them. The result of an experiment with those data shows diverse execution performance depending on the placement of input and intermediate data which are generated during application execution. However, initial data placement strategy would not be the best plan for long running experiments because of the dynamic change of resource condition time to time. We propose an adaptive data placement strategy considering dynamic resource change for efficient data-intensive applications. The strategy consists of two stages that group the datasets in data centers during the build- time stage and dynamically clusters every time newly generated datasets repeatedly to the most appropriate data centers during runtime stage, which is based on task dependency, intense degree of data usage, and just-in-time resource availability. Just-in-time data placement coming with task execution is more efficient than the one with initialization stage of experiments in the aspect of resource utilization. Experiments show that data movement can be effectively reduced while the workflow is running","PeriodicalId":19331,"journal":{"name":"NOMS 2018 - 2018 IEEE/IFIP Network Operations and Management Symposium","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2018-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":"{\"title\":\"An adaptive data placement strategy in scientific workflows over cloud computing environments\",\"authors\":\"Heewon Kim, Yoonhee Kim\",\"doi\":\"10.1109/NOMS.2018.8406191\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Data of scientific workflow applications are tend to be distributed over many data centers to be effectively stored, retrieved, and transferred among them. The result of an experiment with those data shows diverse execution performance depending on the placement of input and intermediate data which are generated during application execution. However, initial data placement strategy would not be the best plan for long running experiments because of the dynamic change of resource condition time to time. We propose an adaptive data placement strategy considering dynamic resource change for efficient data-intensive applications. The strategy consists of two stages that group the datasets in data centers during the build- time stage and dynamically clusters every time newly generated datasets repeatedly to the most appropriate data centers during runtime stage, which is based on task dependency, intense degree of data usage, and just-in-time resource availability. Just-in-time data placement coming with task execution is more efficient than the one with initialization stage of experiments in the aspect of resource utilization. Experiments show that data movement can be effectively reduced while the workflow is running\",\"PeriodicalId\":19331,\"journal\":{\"name\":\"NOMS 2018 - 2018 IEEE/IFIP Network Operations and Management Symposium\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-04-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"NOMS 2018 - 2018 IEEE/IFIP Network Operations and Management Symposium\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/NOMS.2018.8406191\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"NOMS 2018 - 2018 IEEE/IFIP Network Operations and Management Symposium","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NOMS.2018.8406191","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11

摘要

科学工作流应用的数据往往分布在多个数据中心,以便在多个数据中心之间进行有效的存储、检索和传输。对这些数据进行实验的结果显示,根据应用程序执行期间生成的输入数据和中间数据的位置,执行性能会有所不同。然而,由于资源条件的动态变化,初始数据放置策略并不是长期运行实验的最佳方案。针对高效的数据密集型应用,提出了一种考虑动态资源变化的自适应数据放置策略。该策略包括两个阶段,在构建时阶段对数据中心中的数据集进行分组,并在运行时阶段根据任务依赖性、数据使用强度和实时资源可用性,动态地将每次新生成的数据集重复聚集到最合适的数据中心。在资源利用方面,随任务执行而来的实时数据放置比实验初始化阶段的数据放置更有效。实验表明,该方法可以有效地减少工作流运行过程中的数据移动
本文章由计算机程序翻译,如有差异,请以英文原文为准。
An adaptive data placement strategy in scientific workflows over cloud computing environments
Data of scientific workflow applications are tend to be distributed over many data centers to be effectively stored, retrieved, and transferred among them. The result of an experiment with those data shows diverse execution performance depending on the placement of input and intermediate data which are generated during application execution. However, initial data placement strategy would not be the best plan for long running experiments because of the dynamic change of resource condition time to time. We propose an adaptive data placement strategy considering dynamic resource change for efficient data-intensive applications. The strategy consists of two stages that group the datasets in data centers during the build- time stage and dynamically clusters every time newly generated datasets repeatedly to the most appropriate data centers during runtime stage, which is based on task dependency, intense degree of data usage, and just-in-time resource availability. Just-in-time data placement coming with task execution is more efficient than the one with initialization stage of experiments in the aspect of resource utilization. Experiments show that data movement can be effectively reduced while the workflow is running
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信