LegoDNN: block-grained scaling of deep neural networks for mobile vision

Proceedings of the 27th Annual International Conference on Mobile Computing and Networking Pub Date : 2021-10-19 DOI:10.1145/3447993.3483249

Rui Han, Qinglong Zhang, C. Liu, Guoren Wang, Jian Tang, L. Chen

{"title":"LegoDNN: block-grained scaling of deep neural networks for mobile vision","authors":"Rui Han, Qinglong Zhang, C. Liu, Guoren Wang, Jian Tang, L. Chen","doi":"10.1145/3447993.3483249","DOIUrl":null,"url":null,"abstract":"Deep neural networks (DNNs) have become ubiquitous techniques in mobile and embedded systems for applications such as image/object recognition and classification. The trend of executing multiple DNNs simultaneously exacerbate the existing limitations of meeting stringent latency/accuracy requirements on resource constrained mobile devices. The prior art sheds light on exploring the accuracy-resource tradeoff by scaling the model sizes in accordance to resource dynamics. However, such model scaling approaches face to imminent challenges: (i) large space exploration of model sizes, and (ii) prohibitively long training time for different model combinations. In this paper, we present LegoDNN, a lightweight, block-grained scaling solution for running multi-DNN workloads in mobile vision systems. LegoDNN guarantees short model training times by only extracting and training a small number of common blocks (e.g. 5 in VGG and 8 in ResNet) in a DNN. At run-time, LegoDNN optimally combines the descendant models of these blocks to maximize accuracy under specific resources and latency constraints, while reducing switching overhead via smart block-level scaling of the DNN. We implement LegoDNN in TensorFlow Lite and extensively evaluate it against state-of-the-art techniques (FLOP scaling, knowledge distillation and model compression) using a set of 12 popular DNN models. Evaluation results show that LegoDNN provides 1,296x to 279,936x more options in model sizes without increasing training time, thus achieving as much as 31.74% improvement in inference accuracy and 71.07% reduction in scaling energy consumptions.","PeriodicalId":177431,"journal":{"name":"Proceedings of the 27th Annual International Conference on Mobile Computing and Networking","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"23","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 27th Annual International Conference on Mobile Computing and Networking","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3447993.3483249","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 23

Abstract

Deep neural networks (DNNs) have become ubiquitous techniques in mobile and embedded systems for applications such as image/object recognition and classification. The trend of executing multiple DNNs simultaneously exacerbate the existing limitations of meeting stringent latency/accuracy requirements on resource constrained mobile devices. The prior art sheds light on exploring the accuracy-resource tradeoff by scaling the model sizes in accordance to resource dynamics. However, such model scaling approaches face to imminent challenges: (i) large space exploration of model sizes, and (ii) prohibitively long training time for different model combinations. In this paper, we present LegoDNN, a lightweight, block-grained scaling solution for running multi-DNN workloads in mobile vision systems. LegoDNN guarantees short model training times by only extracting and training a small number of common blocks (e.g. 5 in VGG and 8 in ResNet) in a DNN. At run-time, LegoDNN optimally combines the descendant models of these blocks to maximize accuracy under specific resources and latency constraints, while reducing switching overhead via smart block-level scaling of the DNN. We implement LegoDNN in TensorFlow Lite and extensively evaluate it against state-of-the-art techniques (FLOP scaling, knowledge distillation and model compression) using a set of 12 popular DNN models. Evaluation results show that LegoDNN provides 1,296x to 279,936x more options in model sizes without increasing training time, thus achieving as much as 31.74% improvement in inference accuracy and 71.07% reduction in scaling energy consumptions.

查看原文本刊更多论文

LegoDNN:用于移动视觉的深度神经网络的块粒度缩放

深度神经网络(dnn)已经成为移动和嵌入式系统中无处不在的技术，用于图像/物体识别和分类等应用。同时执行多个dnn的趋势加剧了在资源受限的移动设备上满足严格的延迟/精度要求的现有限制。现有技术通过根据资源动态缩放模型大小来探索准确性-资源权衡。然而，这种模型缩放方法面临着迫在眉睫的挑战:(i)模型尺寸的大空间探索，以及(ii)不同模型组合的训练时间过长。在本文中，我们提出了LegoDNN，一种在移动视觉系统中运行多dnn工作负载的轻量级、块粒度缩放解决方案。LegoDNN通过在DNN中只提取和训练少量的公共块(例如VGG中的5个和ResNet中的8个)来保证较短的模型训练时间。在运行时，LegoDNN优化地结合了这些块的后代模型，以在特定资源和延迟约束下最大化准确性，同时通过DNN的智能块级扩展减少切换开销。我们在TensorFlow Lite中实现了lego - nn，并使用一组12个流行的DNN模型对最先进的技术(FLOP缩放、知识蒸馏和模型压缩)进行了广泛的评估。评估结果表明，在不增加训练时间的情况下，LegoDNN在模型大小上多提供了1,296 ~ 279,936倍的选项，从而实现了高达31.74%的推理精度提高和71.07%的缩放能耗降低。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 27th Annual International Conference on Mobile Computing and Networking

自引率

0.00%

发文量