阵列感知训练/剪枝:基于阵列的神经网络加速器有效前向传播方法

2020 IEEE 31st International Conference on Application-specific Systems, Architectures and Processors (ASAP) Pub Date : 2020-07-01 DOI:10.1109/ASAP49362.2020.00016

Krishna Teja Chitty-Venkata, Arun Kumar Somani

{"title":"阵列感知训练/剪枝:基于阵列的神经网络加速器有效前向传播方法","authors":"Krishna Teja Chitty-Venkata, Arun Kumar Somani","doi":"10.1109/ASAP49362.2020.00016","DOIUrl":null,"url":null,"abstract":"Due to the increase in the use of large-sized Deep Neural Networks (DNNs) over the years, specialized hardware accelerators such as Tensor Processing Unit and Eyeriss have been developed to accelerate the forward pass of the network. The essential component of these devices is an array processor which is composed of multiple individual compute units for efficiently executing Multiplication and Accumulation (MAC) operation. As the size of this array limits the amount of DNN processing of a single layer, the computation is performed in several batches serially leading to extra compute cycles along both the axes. In practice, due to the mismatch between matrix and array sizes, the computation does not map on the array exactly. In this work, we address the issue of minimizing processing cycles on the array by adjusting the DNN model parameters by using a structured hardware array dependent optimization. We introduce two techniques in this paper: Array Aware Training (AAT) for efficient training and Array Aware Pruning (AAP) for efficient inference. Weight pruning is an approach to remove redundant parameters in the network to decrease the size of the network. The key idea behind pruning in this paper is to adjust the model parameters (the weight matrix) so that the array is fully utilized in each computation batch. Our goal is to compress the model based on the size of the array so as to reduce the number of computation cycles. We observe that both the proposed techniques results into similar accuracy as the original network while saving a significant number of processing cycles (75%).","PeriodicalId":375691,"journal":{"name":"2020 IEEE 31st International Conference on Application-specific Systems, Architectures and Processors (ASAP)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Array Aware Training/Pruning: Methods for Efficient Forward Propagation on Array-based Neural Network Accelerators\",\"authors\":\"Krishna Teja Chitty-Venkata, Arun Kumar Somani\",\"doi\":\"10.1109/ASAP49362.2020.00016\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Due to the increase in the use of large-sized Deep Neural Networks (DNNs) over the years, specialized hardware accelerators such as Tensor Processing Unit and Eyeriss have been developed to accelerate the forward pass of the network. The essential component of these devices is an array processor which is composed of multiple individual compute units for efficiently executing Multiplication and Accumulation (MAC) operation. As the size of this array limits the amount of DNN processing of a single layer, the computation is performed in several batches serially leading to extra compute cycles along both the axes. In practice, due to the mismatch between matrix and array sizes, the computation does not map on the array exactly. In this work, we address the issue of minimizing processing cycles on the array by adjusting the DNN model parameters by using a structured hardware array dependent optimization. We introduce two techniques in this paper: Array Aware Training (AAT) for efficient training and Array Aware Pruning (AAP) for efficient inference. Weight pruning is an approach to remove redundant parameters in the network to decrease the size of the network. The key idea behind pruning in this paper is to adjust the model parameters (the weight matrix) so that the array is fully utilized in each computation batch. Our goal is to compress the model based on the size of the array so as to reduce the number of computation cycles. We observe that both the proposed techniques results into similar accuracy as the original network while saving a significant number of processing cycles (75%).\",\"PeriodicalId\":375691,\"journal\":{\"name\":\"2020 IEEE 31st International Conference on Application-specific Systems, Architectures and Processors (ASAP)\",\"volume\":\"45 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE 31st International Conference on Application-specific Systems, Architectures and Processors (ASAP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ASAP49362.2020.00016\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 31st International Conference on Application-specific Systems, Architectures and Processors (ASAP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASAP49362.2020.00016","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

摘要

近年来，由于大规模深度神经网络(dnn)的使用越来越多，专门的硬件加速器如Tensor Processing Unit和Eyeriss已经被开发出来，以加速网络的前向传递。这些设备的基本组件是一个阵列处理器，它由多个独立的计算单元组成，用于有效地执行乘法和累加(MAC)操作。由于这个数组的大小限制了单个层的深度神经网络处理的数量，计算是在几个批次中连续执行的，导致沿两个轴都有额外的计算周期。在实践中，由于矩阵和数组大小之间的不匹配，计算不能准确地映射到数组上。在这项工作中，我们通过使用结构化硬件阵列相关优化来调整DNN模型参数，从而解决了最小化阵列处理周期的问题。本文介绍了两种技术:用于高效训练的阵列感知训练(AAT)和用于高效推理的阵列感知修剪(AAP)。权值剪枝是一种去除网络中冗余参数以减小网络规模的方法。本文的关键思想是调整模型参数(权矩阵)，使数组在每个计算批中得到充分利用。我们的目标是根据数组的大小对模型进行压缩，从而减少计算周期。我们观察到，这两种提出的技术都达到了与原始网络相似的精度，同时节省了大量的处理周期(75%)。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Array Aware Training/Pruning: Methods for Efficient Forward Propagation on Array-based Neural Network Accelerators

Due to the increase in the use of large-sized Deep Neural Networks (DNNs) over the years, specialized hardware accelerators such as Tensor Processing Unit and Eyeriss have been developed to accelerate the forward pass of the network. The essential component of these devices is an array processor which is composed of multiple individual compute units for efficiently executing Multiplication and Accumulation (MAC) operation. As the size of this array limits the amount of DNN processing of a single layer, the computation is performed in several batches serially leading to extra compute cycles along both the axes. In practice, due to the mismatch between matrix and array sizes, the computation does not map on the array exactly. In this work, we address the issue of minimizing processing cycles on the array by adjusting the DNN model parameters by using a structured hardware array dependent optimization. We introduce two techniques in this paper: Array Aware Training (AAT) for efficient training and Array Aware Pruning (AAP) for efficient inference. Weight pruning is an approach to remove redundant parameters in the network to decrease the size of the network. The key idea behind pruning in this paper is to adjust the model parameters (the weight matrix) so that the array is fully utilized in each computation batch. Our goal is to compress the model based on the size of the array so as to reduce the number of computation cycles. We observe that both the proposed techniques results into similar accuracy as the original network while saving a significant number of processing cycles (75%).

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 IEEE 31st International Conference on Application-specific Systems, Architectures and Processors (ASAP)

自引率

0.00%

发文量