Dominik Schäfer, Janick Edinger, Martin Breitbach, C. Becker
{"title":"Workload Partitioning and Task Migration to Reduce Response Times in Heterogeneous Computing Environments","authors":"Dominik Schäfer, Janick Edinger, Martin Breitbach, C. Becker","doi":"10.1109/ICCCN.2018.8487326","DOIUrl":null,"url":null,"abstract":"Today's modern computing landscape consists of a huge amount of heterogeneous devices, including powerful, stable desktop computers as well as lightweight, unreliable mobile edge devices. This heterogeneity in terms of computation power and reliability increases the complexity for fault tolerance in distributed computing systems. When tasks are offloaded, slow resource providers easily become the bottleneck of a parallel computation. Further, unstable edge devices can leave the system spontaneously, discontinue remote tasks executions, and therefore lose the computation progress. These two effects increase the response time for remote task executions. In this paper, we introduce two mechanisms to avoid delayed or lost task executions caused by edge devices. This paper has five contributions. First, we define a failure model and identify the parameters that determine the magnitude of delays caused by faults and performance bottlenecks. Second, we present reactive and proactive task migration to handle system leaves. Third, we show how computational bottlenecks can be avoided by two-dimensional context-aware task partitioning. Fourth, we integrate these two solutions into an existing heterogeneous distributed computing system. Fifth, we run an evaluation on a real- world testbed to show the benefits of the solutions in practice. The evaluation shows, that we can improve systems with device fluctuation and heterogeneity by up to 39% and 53% respectively.","PeriodicalId":399145,"journal":{"name":"2018 27th International Conference on Computer Communication and Networks (ICCCN)","volume":"324 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 27th International Conference on Computer Communication and Networks (ICCCN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCCN.2018.8487326","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7
Abstract
Today's modern computing landscape consists of a huge amount of heterogeneous devices, including powerful, stable desktop computers as well as lightweight, unreliable mobile edge devices. This heterogeneity in terms of computation power and reliability increases the complexity for fault tolerance in distributed computing systems. When tasks are offloaded, slow resource providers easily become the bottleneck of a parallel computation. Further, unstable edge devices can leave the system spontaneously, discontinue remote tasks executions, and therefore lose the computation progress. These two effects increase the response time for remote task executions. In this paper, we introduce two mechanisms to avoid delayed or lost task executions caused by edge devices. This paper has five contributions. First, we define a failure model and identify the parameters that determine the magnitude of delays caused by faults and performance bottlenecks. Second, we present reactive and proactive task migration to handle system leaves. Third, we show how computational bottlenecks can be avoided by two-dimensional context-aware task partitioning. Fourth, we integrate these two solutions into an existing heterogeneous distributed computing system. Fifth, we run an evaluation on a real- world testbed to show the benefits of the solutions in practice. The evaluation shows, that we can improve systems with device fluctuation and heterogeneity by up to 39% and 53% respectively.