Towards Efficient Compact Network Training on Edge-Devices

2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI) Pub Date : 2019-07-15 DOI:10.1109/ISVLSI.2019.00020

Feng Xiong, Fengbin Tu, S. Yin, Shaojun Wei

{"title":"Towards Efficient Compact Network Training on Edge-Devices","authors":"Feng Xiong, Fengbin Tu, S. Yin, Shaojun Wei","doi":"10.1109/ISVLSI.2019.00020","DOIUrl":null,"url":null,"abstract":"Currently, there is a trend to deploy training on edge devices, which is crucial to future AI applications in various scenarios with transfer and online learning demands. Specifically, there may be a severe degradation of accuracy when directly deploying the trained models on edge devices, because the local environment forms an edge local dataset that is often different from the generic dataset. However, training on edge devices with limited computing and memory capability is a challenge problem. In this paper, we propose a novel quantization training framework for efficient compact network training on edge devices. Firstly, training-aware symmetric quantization is introduced to quantize all of the data types in the training process. Then, channel-wise quantization method is adopted for comapact network quantization, which has significantly high tolerance to quantization errors and can make the training process more stable. For further efficient training, we build a hardware evaluation platform to evaluate different settings of the network, so as to achieve a better trade-off among accuracy, energy and latency. Finally, we evaluate two widely used compact networks on a domain adaptation dataset for image classification, and the results demonstrate that the proposed methods can allow us achieve an improvement of 8.4 × -17.2× in energy reduction and 11.9 × -16.3× in latency reduction compared with 32-bit implementations, while maintaining the classification accuracy.","PeriodicalId":6703,"journal":{"name":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"306 1","pages":"61-67"},"PeriodicalIF":0.0000,"publicationDate":"2019-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISVLSI.2019.00020","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Currently, there is a trend to deploy training on edge devices, which is crucial to future AI applications in various scenarios with transfer and online learning demands. Specifically, there may be a severe degradation of accuracy when directly deploying the trained models on edge devices, because the local environment forms an edge local dataset that is often different from the generic dataset. However, training on edge devices with limited computing and memory capability is a challenge problem. In this paper, we propose a novel quantization training framework for efficient compact network training on edge devices. Firstly, training-aware symmetric quantization is introduced to quantize all of the data types in the training process. Then, channel-wise quantization method is adopted for comapact network quantization, which has significantly high tolerance to quantization errors and can make the training process more stable. For further efficient training, we build a hardware evaluation platform to evaluate different settings of the network, so as to achieve a better trade-off among accuracy, energy and latency. Finally, we evaluate two widely used compact networks on a domain adaptation dataset for image classification, and the results demonstrate that the proposed methods can allow us achieve an improvement of 8.4 × -17.2× in energy reduction and 11.9 × -16.3× in latency reduction compared with 32-bit implementations, while maintaining the classification accuracy.

查看原文本刊更多论文

基于边缘设备的高效紧凑网络训练

目前，在边缘设备上部署培训是一种趋势，这对于未来具有迁移和在线学习需求的各种场景中的人工智能应用至关重要。具体来说，当直接在边缘设备上部署训练好的模型时，可能会严重降低准确性，因为局部环境形成的边缘局部数据集通常不同于通用数据集。然而，在计算和存储能力有限的边缘设备上进行训练是一个具有挑战性的问题。本文提出了一种新的量化训练框架，用于在边缘设备上进行高效的紧凑网络训练。首先，引入训练感知对称量化，对训练过程中的所有数据类型进行量化;然后，采用通道量化方法对紧致网络进行量化，对量化误差的容忍度显著提高，使训练过程更加稳定。为了进一步提高训练效率，我们建立了一个硬件评估平台来评估不同的网络设置，从而在准确率、能量和延迟之间实现更好的权衡。最后，我们在一个领域自适应数据集上对两种广泛使用的压缩网络进行了评估，结果表明，与32位实现相比，我们提出的方法在保持分类精度的同时，可以实现8.4 × -17.2×的能量减少和11.9 × -16.3×的延迟减少。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)

自引率

0.00%

发文量