总线宽度感知芯片外存储器访问最小化CNN加速器

2020 IEEE Computer Society Annual Symposium on VLSI (ISVLSI) Pub Date : 2020-07-01 DOI:10.1109/ISVLSI49217.2020.00051

S. Tewari, Anshul Kumar, K. Paul

{"title":"总线宽度感知芯片外存储器访问最小化CNN加速器","authors":"S. Tewari, Anshul Kumar, K. Paul","doi":"10.1109/ISVLSI49217.2020.00051","DOIUrl":null,"url":null,"abstract":"Convolutional Neural Network (CNN) accelerators have gained popularity due to their ability to speed up the CNN based applications. However, the energy efficiency of these accelerators is limiting their ubiquitous usage in energy-constrained devices. A significant fraction of their energy consumption results from off-chip memory accesses. In order to get high throughput, these accelerators connect to off-chip memory by a wide data bus. However, accessing the data of size, not a multiple of the bus width, results in wastage of energy. We observed that off-chip memory accesses could be reduced significantly by partitioning the data that optimally utilizes bus width and increases the number of aligned accesses. In this work, we propose a bus width aware approach to determine the optimal partition of the convolution layers to reduce the off-chip memory accesses. Our tool evaluates the off-chip memory accesses for different data partitions, and data reuse schemes to find the optimal partition. We have experimented with two popular CNNs, VGG16 and AlexNet. Our approach reduces off-chip memory accesses of VGG16 by 16% and 29% and of AlexNet by 9% and 16% on 64 and 128 bits data bus, respectively, compared to the state of the art approach.","PeriodicalId":423851,"journal":{"name":"2020 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Bus Width Aware Off-Chip Memory Access Minimization for CNN Accelerators\",\"authors\":\"S. Tewari, Anshul Kumar, K. Paul\",\"doi\":\"10.1109/ISVLSI49217.2020.00051\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Convolutional Neural Network (CNN) accelerators have gained popularity due to their ability to speed up the CNN based applications. However, the energy efficiency of these accelerators is limiting their ubiquitous usage in energy-constrained devices. A significant fraction of their energy consumption results from off-chip memory accesses. In order to get high throughput, these accelerators connect to off-chip memory by a wide data bus. However, accessing the data of size, not a multiple of the bus width, results in wastage of energy. We observed that off-chip memory accesses could be reduced significantly by partitioning the data that optimally utilizes bus width and increases the number of aligned accesses. In this work, we propose a bus width aware approach to determine the optimal partition of the convolution layers to reduce the off-chip memory accesses. Our tool evaluates the off-chip memory accesses for different data partitions, and data reuse schemes to find the optimal partition. We have experimented with two popular CNNs, VGG16 and AlexNet. Our approach reduces off-chip memory accesses of VGG16 by 16% and 29% and of AlexNet by 9% and 16% on 64 and 128 bits data bus, respectively, compared to the state of the art approach.\",\"PeriodicalId\":423851,\"journal\":{\"name\":\"2020 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)\",\"volume\":\"10 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISVLSI49217.2020.00051\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISVLSI49217.2020.00051","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

卷积神经网络(CNN)加速器因其加速基于CNN的应用程序的能力而受到欢迎。然而，这些加速器的能源效率限制了它们在能源受限设备中的普遍使用。它们能量消耗的很大一部分来自片外存储器访问。为了获得高吞吐量，这些加速器通过宽数据总线连接到片外存储器。但是，访问大小的数据，而不是总线宽度的倍数，会导致能源的浪费。我们观察到，通过对数据进行分区，以最佳方式利用总线宽度并增加对齐访问的数量，可以显著减少片外内存访问。在这项工作中，我们提出了一种总线宽度感知的方法来确定卷积层的最佳划分，以减少片外存储器访问。我们的工具评估了不同数据分区的片外内存访问，以及数据重用方案，以找到最佳分区。我们用两个流行的cnn, VGG16和AlexNet做了实验。与目前最先进的方法相比，我们的方法在64位和128位数据总线上分别减少了VGG16的16%和29%和AlexNet的9%和16%的片外存储器访问。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Bus Width Aware Off-Chip Memory Access Minimization for CNN Accelerators

Convolutional Neural Network (CNN) accelerators have gained popularity due to their ability to speed up the CNN based applications. However, the energy efficiency of these accelerators is limiting their ubiquitous usage in energy-constrained devices. A significant fraction of their energy consumption results from off-chip memory accesses. In order to get high throughput, these accelerators connect to off-chip memory by a wide data bus. However, accessing the data of size, not a multiple of the bus width, results in wastage of energy. We observed that off-chip memory accesses could be reduced significantly by partitioning the data that optimally utilizes bus width and increases the number of aligned accesses. In this work, we propose a bus width aware approach to determine the optimal partition of the convolution layers to reduce the off-chip memory accesses. Our tool evaluates the off-chip memory accesses for different data partitions, and data reuse schemes to find the optimal partition. We have experimented with two popular CNNs, VGG16 and AlexNet. Our approach reduces off-chip memory accesses of VGG16 by 16% and 29% and of AlexNet by 9% and 16% on 64 and 128 bits data bus, respectively, compared to the state of the art approach.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)

自引率

0.00%

发文量