Workload Partitioning and Task Migration to Reduce Response Times in Heterogeneous Computing Environments

Dominik Schäfer, Janick Edinger, Martin Breitbach, C. Becker
{"title":"Workload Partitioning and Task Migration to Reduce Response Times in Heterogeneous Computing Environments","authors":"Dominik Schäfer, Janick Edinger, Martin Breitbach, C. Becker","doi":"10.1109/ICCCN.2018.8487326","DOIUrl":null,"url":null,"abstract":"Today's modern computing landscape consists of a huge amount of heterogeneous devices, including powerful, stable desktop computers as well as lightweight, unreliable mobile edge devices. This heterogeneity in terms of computation power and reliability increases the complexity for fault tolerance in distributed computing systems. When tasks are offloaded, slow resource providers easily become the bottleneck of a parallel computation. Further, unstable edge devices can leave the system spontaneously, discontinue remote tasks executions, and therefore lose the computation progress. These two effects increase the response time for remote task executions. In this paper, we introduce two mechanisms to avoid delayed or lost task executions caused by edge devices. This paper has five contributions. First, we define a failure model and identify the parameters that determine the magnitude of delays caused by faults and performance bottlenecks. Second, we present reactive and proactive task migration to handle system leaves. Third, we show how computational bottlenecks can be avoided by two-dimensional context-aware task partitioning. Fourth, we integrate these two solutions into an existing heterogeneous distributed computing system. Fifth, we run an evaluation on a real- world testbed to show the benefits of the solutions in practice. The evaluation shows, that we can improve systems with device fluctuation and heterogeneity by up to 39% and 53% respectively.","PeriodicalId":399145,"journal":{"name":"2018 27th International Conference on Computer Communication and Networks (ICCCN)","volume":"324 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 27th International Conference on Computer Communication and Networks (ICCCN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCCN.2018.8487326","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

Abstract

Today's modern computing landscape consists of a huge amount of heterogeneous devices, including powerful, stable desktop computers as well as lightweight, unreliable mobile edge devices. This heterogeneity in terms of computation power and reliability increases the complexity for fault tolerance in distributed computing systems. When tasks are offloaded, slow resource providers easily become the bottleneck of a parallel computation. Further, unstable edge devices can leave the system spontaneously, discontinue remote tasks executions, and therefore lose the computation progress. These two effects increase the response time for remote task executions. In this paper, we introduce two mechanisms to avoid delayed or lost task executions caused by edge devices. This paper has five contributions. First, we define a failure model and identify the parameters that determine the magnitude of delays caused by faults and performance bottlenecks. Second, we present reactive and proactive task migration to handle system leaves. Third, we show how computational bottlenecks can be avoided by two-dimensional context-aware task partitioning. Fourth, we integrate these two solutions into an existing heterogeneous distributed computing system. Fifth, we run an evaluation on a real- world testbed to show the benefits of the solutions in practice. The evaluation shows, that we can improve systems with device fluctuation and heterogeneity by up to 39% and 53% respectively.
异构计算环境下减少响应时间的工作负载分区和任务迁移
当今的现代计算领域由大量异构设备组成,包括功能强大、稳定的台式计算机以及轻量级、不可靠的移动边缘设备。这种计算能力和可靠性方面的异构性增加了分布式计算系统容错的复杂性。当任务被卸载时,缓慢的资源提供者很容易成为并行计算的瓶颈。此外,不稳定的边缘设备可能会自发离开系统,中断远程任务的执行,从而丢失计算进度。这两种效果增加了远程任务执行的响应时间。在本文中,我们引入了两种机制来避免由边缘设备引起的任务执行延迟或丢失。这篇论文有五个贡献。首先,我们定义了一个故障模型,并确定了由故障和性能瓶颈引起的延迟大小的参数。其次,我们提出了被动和主动任务迁移来处理系统叶子。第三,我们展示了如何通过二维上下文感知任务分区来避免计算瓶颈。第四,我们将这两种解决方案集成到现有的异构分布式计算系统中。第五,我们在一个真实世界的测试平台上进行了评估,以显示解决方案在实践中的好处。评估结果表明,该方法可将存在器件波动和异构的系统分别提高39%和53%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信