Automating Deep Neural Network Model Selection for Edge Inference

2019 IEEE First International Conference on Cognitive Machine Intelligence (CogMI) Pub Date : 2019-12-01 DOI:10.1109/CogMI48466.2019.00035

Bingqian Lu, Jianyi Yang, L. Chen, Shaolei Ren

{"title":"Automating Deep Neural Network Model Selection for Edge Inference","authors":"Bingqian Lu, Jianyi Yang, L. Chen, Shaolei Ren","doi":"10.1109/CogMI48466.2019.00035","DOIUrl":null,"url":null,"abstract":"The ever increasing size of deep neural network (DNN) models once implied that they were only limited to cloud data centers for runtime inference. Nonetheless, the recent plethora of DNN model compression techniques have successfully overcome this limit, turning into a reality that DNN-based inference can be run on numerous resource-constrained edge devices including mobile phones, drones, robots, medical devices, wearables, Internet of Things devices, among many others. Naturally, edge devices are highly heterogeneous in terms of hardware specification and usage scenarios. On the other hand, compressed DNN models are so diverse that they exhibit different tradeoffs in a multi-dimension space, and not a single model can achieve optimality in terms of all important metrics such as accuracy, latency and energy consumption. Consequently, how to automatically select a compressed DNN model for an edge device to run inference with optimal quality of experience (QoE) arises as a new challenge. The state-of-the-art approaches either choose a common model for all/most devices, which is optimal for a small fraction of edge devices at best, or apply device-specific DNN model compression, which is not scalable. In this paper, by leveraging the predictive power of machine learning and keeping end users in the loop, we envision an automated device-level DNN model selection engine for QoE-optimal edge inference. To concretize our vision, we formulate the DNN model selection problem into a contextual multi-armed bandit framework, where features of edge devices and DNN models are contexts and pre-trained DNN models are arms selected online based on the history of actions and users' QoE feedback. We develop an efficient online learning algorithm to balance exploration and exploitation. Our preliminary simulation results validate our algorithm and highlight the potential of machine learning for automating DNN model selection to achieve QoE-optimal edge inference.","PeriodicalId":116160,"journal":{"name":"2019 IEEE First International Conference on Cognitive Machine Intelligence (CogMI)","volume":"51 9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE First International Conference on Cognitive Machine Intelligence (CogMI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CogMI48466.2019.00035","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 16

Abstract

The ever increasing size of deep neural network (DNN) models once implied that they were only limited to cloud data centers for runtime inference. Nonetheless, the recent plethora of DNN model compression techniques have successfully overcome this limit, turning into a reality that DNN-based inference can be run on numerous resource-constrained edge devices including mobile phones, drones, robots, medical devices, wearables, Internet of Things devices, among many others. Naturally, edge devices are highly heterogeneous in terms of hardware specification and usage scenarios. On the other hand, compressed DNN models are so diverse that they exhibit different tradeoffs in a multi-dimension space, and not a single model can achieve optimality in terms of all important metrics such as accuracy, latency and energy consumption. Consequently, how to automatically select a compressed DNN model for an edge device to run inference with optimal quality of experience (QoE) arises as a new challenge. The state-of-the-art approaches either choose a common model for all/most devices, which is optimal for a small fraction of edge devices at best, or apply device-specific DNN model compression, which is not scalable. In this paper, by leveraging the predictive power of machine learning and keeping end users in the loop, we envision an automated device-level DNN model selection engine for QoE-optimal edge inference. To concretize our vision, we formulate the DNN model selection problem into a contextual multi-armed bandit framework, where features of edge devices and DNN models are contexts and pre-trained DNN models are arms selected online based on the history of actions and users' QoE feedback. We develop an efficient online learning algorithm to balance exploration and exploitation. Our preliminary simulation results validate our algorithm and highlight the potential of machine learning for automating DNN model selection to achieve QoE-optimal edge inference.

查看原文本刊更多论文

边缘推理中深度神经网络模型选择的自动化

深度神经网络(DNN)模型的规模不断扩大，一度意味着它们只能局限于云数据中心进行运行时推理。尽管如此，最近大量的DNN模型压缩技术已经成功地克服了这一限制，使基于DNN的推理可以在许多资源受限的边缘设备上运行，包括移动电话、无人机、机器人、医疗设备、可穿戴设备、物联网设备等。当然，边缘设备在硬件规格和使用场景方面是高度异构的。另一方面，压缩DNN模型是如此多样化，以至于它们在多维空间中表现出不同的权衡，并且没有一个模型可以在所有重要指标(如准确性，延迟和能耗)方面实现最优性。因此，如何为边缘设备自动选择一个压缩DNN模型来运行具有最佳经验质量(QoE)的推理就成为一个新的挑战。最先进的方法要么为所有/大多数设备选择一个通用模型，这最多只适用于一小部分边缘设备，要么应用特定于设备的DNN模型压缩，这是不可扩展的。在本文中，通过利用机器学习的预测能力并保持最终用户在循环中，我们设想了一个用于qoe最优边缘推理的自动化设备级DNN模型选择引擎。为了具体化我们的愿景，我们将DNN模型选择问题制定为上下文多臂强盗框架，其中边缘设备和DNN模型的特征是上下文，预训练的DNN模型是基于操作历史和用户QoE反馈在线选择的武器。我们开发了一种有效的在线学习算法来平衡探索和利用。我们的初步仿真结果验证了我们的算法，并突出了机器学习在自动DNN模型选择方面的潜力，以实现qoe最优边缘推断。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 IEEE First International Conference on Cognitive Machine Intelligence (CogMI)

自引率

0.00%

发文量