GeoClone:不确定条件下地理分布式分析的在线任务复制和调度

Tiantian Wang, Zhuzhong Qian, Lei Jiao, Xin Li, Sanglu Lu
{"title":"GeoClone:不确定条件下地理分布式分析的在线任务复制和调度","authors":"Tiantian Wang, Zhuzhong Qian, Lei Jiao, Xin Li, Sanglu Lu","doi":"10.1109/IWQoS49365.2020.9212862","DOIUrl":null,"url":null,"abstract":"The execution and completion of analytics jobs can be significantly inflated by the slowest tasks contained. Despite task replication is well-adopted to reduce such straggler latency, existing replication strategies are unsuitable for geo-distributed analytics environments that are highly dynamic, uncertain, and heterogeneous. In this paper, we firstly model the task replication and scheduling problem over time, capturing the geo-analytics features. Afterwards, we design an online algorithm, GeoClone, to select tasks to replicate and select sites to execute the task replicas in an irrevocably online manner, through jointly considering the execution progress of each job and the resource performance in each site. We rigorously prove the competitive ratio to exhibit the theoretical performance guarantee of GeoClone, compared against the offline optimal algorithm which knows all the inputs at once beforehand. Finally, we implement GeoClone with Spark and Yarn for experiments and also conduct extensive large-scale simulations, which confirms GeoClone's practical superiority over multiple state-of-the-art replication strategies.","PeriodicalId":177899,"journal":{"name":"2020 IEEE/ACM 28th International Symposium on Quality of Service (IWQoS)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"GeoClone: Online Task Replication and Scheduling for Geo-Distributed Analytics under Uncertainties\",\"authors\":\"Tiantian Wang, Zhuzhong Qian, Lei Jiao, Xin Li, Sanglu Lu\",\"doi\":\"10.1109/IWQoS49365.2020.9212862\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The execution and completion of analytics jobs can be significantly inflated by the slowest tasks contained. Despite task replication is well-adopted to reduce such straggler latency, existing replication strategies are unsuitable for geo-distributed analytics environments that are highly dynamic, uncertain, and heterogeneous. In this paper, we firstly model the task replication and scheduling problem over time, capturing the geo-analytics features. Afterwards, we design an online algorithm, GeoClone, to select tasks to replicate and select sites to execute the task replicas in an irrevocably online manner, through jointly considering the execution progress of each job and the resource performance in each site. We rigorously prove the competitive ratio to exhibit the theoretical performance guarantee of GeoClone, compared against the offline optimal algorithm which knows all the inputs at once beforehand. Finally, we implement GeoClone with Spark and Yarn for experiments and also conduct extensive large-scale simulations, which confirms GeoClone's practical superiority over multiple state-of-the-art replication strategies.\",\"PeriodicalId\":177899,\"journal\":{\"name\":\"2020 IEEE/ACM 28th International Symposium on Quality of Service (IWQoS)\",\"volume\":\"38 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE/ACM 28th International Symposium on Quality of Service (IWQoS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IWQoS49365.2020.9212862\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE/ACM 28th International Symposium on Quality of Service (IWQoS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IWQoS49365.2020.9212862","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

摘要

分析工作的执行和完成可能会因所包含的最慢任务而大大膨胀。尽管任务复制被很好地用于减少这种离散延迟,但现有的复制策略不适合高度动态、不确定和异构的地理分布式分析环境。在本文中,我们首先对任务复制和调度问题建模,捕获地理分析特征。随后,我们设计了一种在线算法GeoClone,通过综合考虑每个任务的执行进度和每个站点的资源性能,以不可撤销的在线方式选择要复制的任务和执行任务副本的站点。与预先知道所有输入的离线最优算法相比,我们严格证明了竞争比,以显示GeoClone的理论性能保证。最后,我们用Spark和Yarn实现了GeoClone的实验,并进行了广泛的大规模模拟,这证实了GeoClone在多种最先进的复制策略中的实际优势。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
GeoClone: Online Task Replication and Scheduling for Geo-Distributed Analytics under Uncertainties
The execution and completion of analytics jobs can be significantly inflated by the slowest tasks contained. Despite task replication is well-adopted to reduce such straggler latency, existing replication strategies are unsuitable for geo-distributed analytics environments that are highly dynamic, uncertain, and heterogeneous. In this paper, we firstly model the task replication and scheduling problem over time, capturing the geo-analytics features. Afterwards, we design an online algorithm, GeoClone, to select tasks to replicate and select sites to execute the task replicas in an irrevocably online manner, through jointly considering the execution progress of each job and the resource performance in each site. We rigorously prove the competitive ratio to exhibit the theoretical performance guarantee of GeoClone, compared against the offline optimal algorithm which knows all the inputs at once beforehand. Finally, we implement GeoClone with Spark and Yarn for experiments and also conduct extensive large-scale simulations, which confirms GeoClone's practical superiority over multiple state-of-the-art replication strategies.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信