Processing-in-Memory Accelerator for Dynamic Neural Network with Run-Time Tuning of Accuracy, Power and Latency

2020 IEEE 33rd International System-on-Chip Conference (SOCC) Pub Date : 2020-09-08 DOI:10.1109/socc49529.2020.9524770

Li Yang, Zhezhi He, Shaahin Angizi, Deliang Fan

{"title":"Processing-in-Memory Accelerator for Dynamic Neural Network with Run-Time Tuning of Accuracy, Power and Latency","authors":"Li Yang, Zhezhi He, Shaahin Angizi, Deliang Fan","doi":"10.1109/socc49529.2020.9524770","DOIUrl":null,"url":null,"abstract":"With the widely deployment of powerful deep neural network (DNN) into smart, but resource limited IoT devices, many prior works have been proposed to compress DNN in a hardware-aware manner to reduce the computing complexity, while maintaining accuracy, such as weight quantization, pruning, convolution decomposition, etc. However, in typical DNN compression methods, a smaller, but fixed, network structure is generated from a relative large background model for resource limited hardware accelerator deployment. However, such optimization lacks the ability to tune its structure on-the-fly to best fit for a dynamic computing hardware resource allocation and workloads. In this paper, we mainly review two of our prior works [1], [2] to address this issue, discussing how to construct a dynamic DNN structure through either uniform or non-uniform channel selection based sub-network sampling. The constructed dynamic DNN could tune its computing path to involve different number of channels, thus providing the ability to trade-off between speed, power and accuracy on-the-fly after model deployment. Correspondingly, an emerging Spin-Orbit Torque Magnetic Random-Access-Memory (SOT-MRAM) based Processing-In-Memory (PIM) accelerator will also be discussed for such dynamic neural network structure.","PeriodicalId":114740,"journal":{"name":"2020 IEEE 33rd International System-on-Chip Conference (SOCC)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 33rd International System-on-Chip Conference (SOCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/socc49529.2020.9524770","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

With the widely deployment of powerful deep neural network (DNN) into smart, but resource limited IoT devices, many prior works have been proposed to compress DNN in a hardware-aware manner to reduce the computing complexity, while maintaining accuracy, such as weight quantization, pruning, convolution decomposition, etc. However, in typical DNN compression methods, a smaller, but fixed, network structure is generated from a relative large background model for resource limited hardware accelerator deployment. However, such optimization lacks the ability to tune its structure on-the-fly to best fit for a dynamic computing hardware resource allocation and workloads. In this paper, we mainly review two of our prior works [1], [2] to address this issue, discussing how to construct a dynamic DNN structure through either uniform or non-uniform channel selection based sub-network sampling. The constructed dynamic DNN could tune its computing path to involve different number of channels, thus providing the ability to trade-off between speed, power and accuracy on-the-fly after model deployment. Correspondingly, an emerging Spin-Orbit Torque Magnetic Random-Access-Memory (SOT-MRAM) based Processing-In-Memory (PIM) accelerator will also be discussed for such dynamic neural network structure.

查看原文本刊更多论文

具有运行时精度、功率和延迟调整的动态神经网络内存处理加速器

随着功能强大的深度神经网络(deep neural network, DNN)在智能但资源有限的物联网设备中的广泛应用，人们提出了许多以硬件感知的方式压缩DNN以降低计算复杂度，同时保持精度的方法，如权值量化、剪枝、卷积分解等。然而，在典型的深度神经网络压缩方法中，对于资源有限的硬件加速器部署，从相对较大的背景模型生成较小但固定的网络结构。然而，这种优化缺乏动态调整其结构以最适合动态计算硬件资源分配和工作负载的能力。在本文中，我们主要回顾了我们之前的两个工作[1]，[2]来解决这个问题，讨论了如何通过基于均匀或非均匀信道选择的子网络采样来构建动态DNN结构。构建的动态深度神经网络可以调整其计算路径以涉及不同数量的通道，从而在模型部署后提供在速度，功率和准确性之间进行权衡的能力。相应的，针对这种动态神经网络结构，还将讨论一种新兴的基于自旋轨道转矩磁随机存取存储器(SOT-MRAM)的内存中处理(PIM)加速器。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2020 IEEE 33rd International System-on-Chip Conference (SOCC)

自引率

0.00%

发文量