Enable Deep Learning on Mobile Devices: Methods, Systems, and Applications

ACM Transactions on Design Automation of Electronic Systems (TODAES) Pub Date : 2022-03-04 DOI:10.1145/3486618

Han Cai, Ji Lin, Yujun Lin, Zhijian Liu, Haotian Tang, Hanrui Wang, Ligeng Zhu, Song Han

{"title":"Enable Deep Learning on Mobile Devices: Methods, Systems, and Applications","authors":"Han Cai, Ji Lin, Yujun Lin, Zhijian Liu, Haotian Tang, Hanrui Wang, Ligeng Zhu, Song Han","doi":"10.1145/3486618","DOIUrl":null,"url":null,"abstract":"Deep neural networks (DNNs) have achieved unprecedented success in the field of artificial intelligence (AI), including computer vision, natural language processing, and speech recognition. However, their superior performance comes at the considerable cost of computational complexity, which greatly hinders their applications in many resource-constrained devices, such as mobile phones and Internet of Things (IoT) devices. Therefore, methods and techniques that are able to lift the efficiency bottleneck while preserving the high accuracy of DNNs are in great demand to enable numerous edge AI applications. This article provides an overview of efficient deep learning methods, systems, and applications. We start from introducing popular model compression methods, including pruning, factorization, quantization, as well as compact model design. To reduce the large design cost of these manual solutions, we discuss the AutoML framework for each of them, such as neural architecture search (NAS) and automated pruning and quantization. We then cover efficient on-device training to enable user customization based on the local data on mobile devices. Apart from general acceleration techniques, we also showcase several task-specific accelerations for point cloud, video, and natural language processing by exploiting their spatial sparsity and temporal/token redundancy. Finally, to support all these algorithmic advancements, we introduce the efficient deep learning system design from both software and hardware perspectives.","PeriodicalId":6933,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems (TODAES)","volume":"119 1","pages":"1 - 50"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"47","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Design Automation of Electronic Systems (TODAES)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3486618","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 47

Abstract

Deep neural networks (DNNs) have achieved unprecedented success in the field of artificial intelligence (AI), including computer vision, natural language processing, and speech recognition. However, their superior performance comes at the considerable cost of computational complexity, which greatly hinders their applications in many resource-constrained devices, such as mobile phones and Internet of Things (IoT) devices. Therefore, methods and techniques that are able to lift the efficiency bottleneck while preserving the high accuracy of DNNs are in great demand to enable numerous edge AI applications. This article provides an overview of efficient deep learning methods, systems, and applications. We start from introducing popular model compression methods, including pruning, factorization, quantization, as well as compact model design. To reduce the large design cost of these manual solutions, we discuss the AutoML framework for each of them, such as neural architecture search (NAS) and automated pruning and quantization. We then cover efficient on-device training to enable user customization based on the local data on mobile devices. Apart from general acceleration techniques, we also showcase several task-specific accelerations for point cloud, video, and natural language processing by exploiting their spatial sparsity and temporal/token redundancy. Finally, to support all these algorithmic advancements, we introduce the efficient deep learning system design from both software and hardware perspectives.

查看原文本刊更多论文

在移动设备上实现深度学习:方法、系统和应用

深度神经网络(dnn)在人工智能(AI)领域取得了前所未有的成功，包括计算机视觉、自然语言处理和语音识别。然而，它们的优越性能是以相当大的计算复杂性为代价的，这极大地阻碍了它们在许多资源受限设备中的应用，例如移动电话和物联网(IoT)设备。因此，能够在保持深度神经网络高精度的同时解除效率瓶颈的方法和技术对于实现众多边缘人工智能应用是非常需要的。本文概述了高效的深度学习方法、系统和应用程序。首先介绍了常用的模型压缩方法，包括剪枝、因式分解、量化和紧凑模型设计。为了减少这些人工解决方案的巨大设计成本，我们讨论了每个解决方案的AutoML框架，例如神经结构搜索(NAS)和自动修剪和量化。然后，我们将介绍有效的设备上培训，以实现基于移动设备上的本地数据的用户定制。除了一般的加速技术，我们还通过利用空间稀疏性和时间/令牌冗余，展示了点云、视频和自然语言处理的几种特定于任务的加速技术。最后，为了支持所有这些算法的进步，我们从软件和硬件的角度介绍了高效的深度学习系统设计。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ACM Transactions on Design Automation of Electronic Systems (TODAES)

自引率

0.00%

发文量