{"title":"Bus Width Aware Off-Chip Memory Access Minimization for CNN Accelerators","authors":"S. Tewari, Anshul Kumar, K. Paul","doi":"10.1109/ISVLSI49217.2020.00051","DOIUrl":null,"url":null,"abstract":"Convolutional Neural Network (CNN) accelerators have gained popularity due to their ability to speed up the CNN based applications. However, the energy efficiency of these accelerators is limiting their ubiquitous usage in energy-constrained devices. A significant fraction of their energy consumption results from off-chip memory accesses. In order to get high throughput, these accelerators connect to off-chip memory by a wide data bus. However, accessing the data of size, not a multiple of the bus width, results in wastage of energy. We observed that off-chip memory accesses could be reduced significantly by partitioning the data that optimally utilizes bus width and increases the number of aligned accesses. In this work, we propose a bus width aware approach to determine the optimal partition of the convolution layers to reduce the off-chip memory accesses. Our tool evaluates the off-chip memory accesses for different data partitions, and data reuse schemes to find the optimal partition. We have experimented with two popular CNNs, VGG16 and AlexNet. Our approach reduces off-chip memory accesses of VGG16 by 16% and 29% and of AlexNet by 9% and 16% on 64 and 128 bits data bus, respectively, compared to the state of the art approach.","PeriodicalId":423851,"journal":{"name":"2020 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISVLSI49217.2020.00051","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Convolutional Neural Network (CNN) accelerators have gained popularity due to their ability to speed up the CNN based applications. However, the energy efficiency of these accelerators is limiting their ubiquitous usage in energy-constrained devices. A significant fraction of their energy consumption results from off-chip memory accesses. In order to get high throughput, these accelerators connect to off-chip memory by a wide data bus. However, accessing the data of size, not a multiple of the bus width, results in wastage of energy. We observed that off-chip memory accesses could be reduced significantly by partitioning the data that optimally utilizes bus width and increases the number of aligned accesses. In this work, we propose a bus width aware approach to determine the optimal partition of the convolution layers to reduce the off-chip memory accesses. Our tool evaluates the off-chip memory accesses for different data partitions, and data reuse schemes to find the optimal partition. We have experimented with two popular CNNs, VGG16 and AlexNet. Our approach reduces off-chip memory accesses of VGG16 by 16% and 29% and of AlexNet by 9% and 16% on 64 and 128 bits data bus, respectively, compared to the state of the art approach.