J. Lorandel, Habiba Lahdhiri, E. Bourdel, Salvatore Monteleone, M. Palesi
{"title":"Efficient Compression Technique for NoC-based Deep Neural Network Accelerators","authors":"J. Lorandel, Habiba Lahdhiri, E. Bourdel, Salvatore Monteleone, M. Palesi","doi":"10.1109/DSD51259.2020.00037","DOIUrl":null,"url":null,"abstract":"Deep Neural Networks (DNNs) are very powerful neural networks, widely used in many applications. On the other hand, such networks are computation and memory intensive, which makes their implementation difficult onto hardwareconstrained systems, that could use network-on-chip as interconnect infrastructure. A way to reduce the traffic generated among memory and the processing elements is to compress the information before their exchange inside the network. In particular, our work focuses on reducing the huge number of DNN parameters, i.e., weights. In this paper, we propose a flexible and low-complexity compression technique which preserves the DNN performance, allowing to reduce the memory footprint and the volume of data to be exchanged while necessitating few hardware resources. The technique is evaluated on several DNN models, achieving a compression rate close to 80% without significant loss in accuracy on AlexNet, ResNet, or LeNet-5.","PeriodicalId":128527,"journal":{"name":"2020 23rd Euromicro Conference on Digital System Design (DSD)","volume":"118 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 23rd Euromicro Conference on Digital System Design (DSD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DSD51259.2020.00037","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
Deep Neural Networks (DNNs) are very powerful neural networks, widely used in many applications. On the other hand, such networks are computation and memory intensive, which makes their implementation difficult onto hardwareconstrained systems, that could use network-on-chip as interconnect infrastructure. A way to reduce the traffic generated among memory and the processing elements is to compress the information before their exchange inside the network. In particular, our work focuses on reducing the huge number of DNN parameters, i.e., weights. In this paper, we propose a flexible and low-complexity compression technique which preserves the DNN performance, allowing to reduce the memory footprint and the volume of data to be exchanged while necessitating few hardware resources. The technique is evaluated on several DNN models, achieving a compression rate close to 80% without significant loss in accuracy on AlexNet, ResNet, or LeNet-5.