{"title":"Hardware Efficient Convolution Processing Unit for Deep Neural Networks","authors":"Anakhi Hazarika, Soumyajit Poddar, H. Rahaman","doi":"10.1109/ISDCS.2019.8719278","DOIUrl":null,"url":null,"abstract":"Convolutional Neural Network (CNN) is a type of deep neural networks that are commonly used for object detection and classification. State-of-the-art hardware for training and inference of CNN architectures require a considerable amount of computation and memory intensive resources. CNN achieves greater accuracy at the cost of high computational complexity and large power consumption. To optimize the memory requirement, processing speed and power, it is crucial to design more efficient accelerator architecture for CNN computation. In this work, an overlap of spatially adjacent data is exploited in order to parallelize the movement of data. A fast, re-configurable hardware accelerator architecture along with optimized kernel design suitable for a variety of CNN models is proposed. Our design achieves 2.1x computational benefits over state-of the-art accelerator architectures.","PeriodicalId":293660,"journal":{"name":"2019 2nd International Symposium on Devices, Circuits and Systems (ISDCS)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 2nd International Symposium on Devices, Circuits and Systems (ISDCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISDCS.2019.8719278","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Convolutional Neural Network (CNN) is a type of deep neural networks that are commonly used for object detection and classification. State-of-the-art hardware for training and inference of CNN architectures require a considerable amount of computation and memory intensive resources. CNN achieves greater accuracy at the cost of high computational complexity and large power consumption. To optimize the memory requirement, processing speed and power, it is crucial to design more efficient accelerator architecture for CNN computation. In this work, an overlap of spatially adjacent data is exploited in order to parallelize the movement of data. A fast, re-configurable hardware accelerator architecture along with optimized kernel design suitable for a variety of CNN models is proposed. Our design achieves 2.1x computational benefits over state-of the-art accelerator architectures.