一种灵活的动态通道自适应深度神经网络内存处理加速器

2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC) Pub Date : 2020-01-01 DOI:10.1109/ASP-DAC47756.2020.9045166

Li Yang, Shaahin Angizi, Deliang Fan

{"title":"一种灵活的动态通道自适应深度神经网络内存处理加速器","authors":"Li Yang, Shaahin Angizi, Deliang Fan","doi":"10.1109/ASP-DAC47756.2020.9045166","DOIUrl":null,"url":null,"abstract":"With the success of deep neural networks (DNN), many recent works have been focusing on developing hardware accelerator for power and resource-limited embedded system via model compression techniques, such as quantization, pruning, low-rank approximation, etc. However, almost all existing DNN structure is fixed after deployment, which lacks runtime adaptive DNN structure to adapt to its dynamic hardware resource, power budget, throughput requirement, as well as dynamic workload. Correspondingly, there is no runtime adaptive hardware platform to support dynamic DNN structure. To address this problem, we first propose a dynamic channel-adaptive deep neural network (CA-DNN) which can adjust the involved convolution channel (i.e. model size, computing load) at run-time (i.e. at inference stage without retraining) to dynamically trade off between power, speed, computing load and accuracy. Further, we utilize knowledge distillation method to optimize the model and quantize the model to 8-bits and 16-bits, respectively, for hardware friendly mapping. We test the proposed model on CIFAR-10 and ImageNet dataset by using ResNet. Comparing with the same model size of individual model, our CA-DNN achieves better accuracy. Moreover, as far as we know, we are the first to propose a Processing-in-Memory accelerator for such adaptive neural networks structure based on Spin Orbit Torque Magnetic Random Access Memory(SOT-MRAM) computational adaptive sub-arrays. Then, we comprehensively analyze the trade-off of the model with different channel-width between the accuracy and the hardware parameters, eg., energy, memory, and area overhead.","PeriodicalId":125112,"journal":{"name":"2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"A Flexible Processing-in-Memory Accelerator for Dynamic Channel-Adaptive Deep Neural Networks\",\"authors\":\"Li Yang, Shaahin Angizi, Deliang Fan\",\"doi\":\"10.1109/ASP-DAC47756.2020.9045166\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the success of deep neural networks (DNN), many recent works have been focusing on developing hardware accelerator for power and resource-limited embedded system via model compression techniques, such as quantization, pruning, low-rank approximation, etc. However, almost all existing DNN structure is fixed after deployment, which lacks runtime adaptive DNN structure to adapt to its dynamic hardware resource, power budget, throughput requirement, as well as dynamic workload. Correspondingly, there is no runtime adaptive hardware platform to support dynamic DNN structure. To address this problem, we first propose a dynamic channel-adaptive deep neural network (CA-DNN) which can adjust the involved convolution channel (i.e. model size, computing load) at run-time (i.e. at inference stage without retraining) to dynamically trade off between power, speed, computing load and accuracy. Further, we utilize knowledge distillation method to optimize the model and quantize the model to 8-bits and 16-bits, respectively, for hardware friendly mapping. We test the proposed model on CIFAR-10 and ImageNet dataset by using ResNet. Comparing with the same model size of individual model, our CA-DNN achieves better accuracy. Moreover, as far as we know, we are the first to propose a Processing-in-Memory accelerator for such adaptive neural networks structure based on Spin Orbit Torque Magnetic Random Access Memory(SOT-MRAM) computational adaptive sub-arrays. Then, we comprehensively analyze the trade-off of the model with different channel-width between the accuracy and the hardware parameters, eg., energy, memory, and area overhead.\",\"PeriodicalId\":125112,\"journal\":{\"name\":\"2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ASP-DAC47756.2020.9045166\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASP-DAC47756.2020.9045166","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

摘要

随着深度神经网络(deep neural networks, DNN)的成功，近年来许多研究工作都集中在利用量化、剪枝、低秩逼近等模型压缩技术为功率和资源有限的嵌入式系统开发硬件加速器。然而，现有的DNN结构在部署后几乎都是固定的，缺乏运行时自适应的DNN结构来适应其动态硬件资源、功耗预算、吞吐量需求以及动态工作负载。相应地，也没有运行时自适应的硬件平台来支持动态DNN结构。为了解决这个问题，我们首先提出了一个动态通道自适应深度神经网络(CA-DNN)，它可以在运行时(即在不需要再训练的推理阶段)调整所涉及的卷积通道(即模型大小、计算负载)，以动态地在功率、速度、计算负载和精度之间进行权衡。此外，我们利用知识蒸馏方法对模型进行优化，并将模型分别量化为8位和16位，以便进行硬件友好映射。我们使用ResNet在CIFAR-10和ImageNet数据集上对所提出的模型进行了测试。与相同模型尺寸的单个模型相比，我们的CA-DNN达到了更好的准确率。此外，据我们所知，我们首次提出了一种基于自旋轨道扭矩磁随机存取存储器(SOT-MRAM)计算自适应子阵列的自适应神经网络结构的内存处理加速器。然后，综合分析了不同信道宽度的模型在精度和硬件参数(如硬件参数)之间的权衡。，能量，内存和面积开销。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Flexible Processing-in-Memory Accelerator for Dynamic Channel-Adaptive Deep Neural Networks

With the success of deep neural networks (DNN), many recent works have been focusing on developing hardware accelerator for power and resource-limited embedded system via model compression techniques, such as quantization, pruning, low-rank approximation, etc. However, almost all existing DNN structure is fixed after deployment, which lacks runtime adaptive DNN structure to adapt to its dynamic hardware resource, power budget, throughput requirement, as well as dynamic workload. Correspondingly, there is no runtime adaptive hardware platform to support dynamic DNN structure. To address this problem, we first propose a dynamic channel-adaptive deep neural network (CA-DNN) which can adjust the involved convolution channel (i.e. model size, computing load) at run-time (i.e. at inference stage without retraining) to dynamically trade off between power, speed, computing load and accuracy. Further, we utilize knowledge distillation method to optimize the model and quantize the model to 8-bits and 16-bits, respectively, for hardware friendly mapping. We test the proposed model on CIFAR-10 and ImageNet dataset by using ResNet. Comparing with the same model size of individual model, our CA-DNN achieves better accuracy. Moreover, as far as we know, we are the first to propose a Processing-in-Memory accelerator for such adaptive neural networks structure based on Spin Orbit Torque Magnetic Random Access Memory(SOT-MRAM) computational adaptive sub-arrays. Then, we comprehensively analyze the trade-off of the model with different channel-width between the accuracy and the hardware parameters, eg., energy, memory, and area overhead.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC)

自引率

0.00%

发文量