Deep-Edge: An Efficient Framework for Deep Learning Model Update on Heterogeneous Edge

2020 IEEE 4th International Conference on Fog and Edge Computing (ICFEC) Pub Date : 2020-04-13 DOI:10.1109/ICFEC50348.2020.00016

Anirban Bhattacharjee, A. Chhokra, Hongyang Sun, Shashank Shekhar, A. Gokhale, G. Karsai, A. Dubey

{"title":"Deep-Edge: An Efficient Framework for Deep Learning Model Update on Heterogeneous Edge","authors":"Anirban Bhattacharjee, A. Chhokra, Hongyang Sun, Shashank Shekhar, A. Gokhale, G. Karsai, A. Dubey","doi":"10.1109/ICFEC50348.2020.00016","DOIUrl":null,"url":null,"abstract":"Deep Learning (DL) model-based AI services are increasingly offered in a variety of predictive analytics services such as computer vision, natural language processing, speech recognition. However, the quality of the DL models can degrade over time due to changes in the input data distribution, thereby requiring periodic model updates. Although cloud data-centers can meet the computational requirements of the resource-intensive and time-consuming model update task, transferring data from the edge devices to the cloud incurs a significant cost in terms of network bandwidth and are prone to data privacy issues. With the advent of GPU-enabled edge devices, the DL model update can be performed at the edge in a distributed manner using multiple connected edge devices. However, efficiently utilizing the edge resources for the model update is a hard problem due to the heterogeneity among the edge devices and the resource interference caused by the colocation of the DL model update task with latency-critical tasks running in the background. To overcome these challenges, we present Deep-Edge, a load- and interference-aware, fault-tolerant resource management framework for performing model update at the edge that uses distributed training. This paper makes the following contributions. First, it provides a unified framework for monitoring, profiling, and deploying the DL model update tasks on heterogeneous edge devices. Second, it presents a scheduler that reduces the total re-training time by appropriately selecting the edge devices and distributing data among them such that no latency-critical applications experience deadline violations. Finally, we present empirical results to validate the efficacy of the framework using a real-world DL model update case-study based on the Caltech dataset and an edge AI cluster testbed.","PeriodicalId":277214,"journal":{"name":"2020 IEEE 4th International Conference on Fog and Edge Computing (ICFEC)","volume":"95 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 4th International Conference on Fog and Edge Computing (ICFEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICFEC50348.2020.00016","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

Deep Learning (DL) model-based AI services are increasingly offered in a variety of predictive analytics services such as computer vision, natural language processing, speech recognition. However, the quality of the DL models can degrade over time due to changes in the input data distribution, thereby requiring periodic model updates. Although cloud data-centers can meet the computational requirements of the resource-intensive and time-consuming model update task, transferring data from the edge devices to the cloud incurs a significant cost in terms of network bandwidth and are prone to data privacy issues. With the advent of GPU-enabled edge devices, the DL model update can be performed at the edge in a distributed manner using multiple connected edge devices. However, efficiently utilizing the edge resources for the model update is a hard problem due to the heterogeneity among the edge devices and the resource interference caused by the colocation of the DL model update task with latency-critical tasks running in the background. To overcome these challenges, we present Deep-Edge, a load- and interference-aware, fault-tolerant resource management framework for performing model update at the edge that uses distributed training. This paper makes the following contributions. First, it provides a unified framework for monitoring, profiling, and deploying the DL model update tasks on heterogeneous edge devices. Second, it presents a scheduler that reduces the total re-training time by appropriately selecting the edge devices and distributing data among them such that no latency-critical applications experience deadline violations. Finally, we present empirical results to validate the efficacy of the framework using a real-world DL model update case-study based on the Caltech dataset and an edge AI cluster testbed.

查看原文本刊更多论文

Deep-Edge:异构边缘上深度学习模型更新的有效框架

基于深度学习(DL)模型的人工智能服务越来越多地用于各种预测分析服务，如计算机视觉、自然语言处理、语音识别。然而，由于输入数据分布的变化，深度学习模型的质量会随着时间的推移而降低，因此需要定期更新模型。虽然云数据中心可以满足资源密集且耗时的模型更新任务的计算需求，但将数据从边缘设备传输到云端会产生巨大的网络带宽成本，并且容易出现数据隐私问题。随着支持gpu的边缘设备的出现，可以使用多个连接的边缘设备以分布式方式在边缘执行DL模型更新。然而，由于边缘设备之间的异构性以及DL模型更新任务与后台运行的延迟关键任务共存所造成的资源干扰，如何有效地利用边缘资源进行模型更新是一个难题。为了克服这些挑战，我们提出了Deep-Edge，这是一个负载和干扰感知、容错的资源管理框架，用于在使用分布式训练的边缘执行模型更新。本文做了以下贡献。首先，它提供了一个统一的框架，用于在异构边缘设备上监视、分析和部署DL模型更新任务。其次，它提出了一个调度器，通过适当地选择边缘设备并在它们之间分发数据来减少总重新训练时间，从而使延迟关键型应用程序不会遇到违反截止日期的情况。最后，我们使用基于加州理工学院数据集和边缘人工智能集群测试平台的真实深度学习模型更新案例研究来验证该框架的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2020 IEEE 4th International Conference on Fog and Edge Computing (ICFEC)

自引率

0.00%

发文量