Aliasger Zaidy, Andre Xian Ming Chang, Vinayak Gokhale, E. Culurciello
{"title":"A High Efficiency Accelerator for Deep Neural Networks","authors":"Aliasger Zaidy, Andre Xian Ming Chang, Vinayak Gokhale, E. Culurciello","doi":"10.1109/EMC2.2018.00010","DOIUrl":null,"url":null,"abstract":"Deep Neural Networks (DNNs) are the current state of the art for various tasks such as object detection, natural language processing and semantic segmentation. These networks are massively parallel, hierarchical models with each level of hierarchy performing millions of operations on a single input. The enormous amount of parallel computation makes these DNNs suitable for custom acceleration. Custom accelerators can provide real time inference of DNNs at low power thus enabling widespread embedded deployment. In this paper, we present Snowflake, a high efficiency, low power accelerator for DNNs. Snowflake was designed to achieve optimum occupancy at low bandwidths and it is agnostic to the network architecture. Snowflake was implemented on the Xilinx Zynq XC7Z045 APSoC and achieves a peak performance of 128 G-ops/s. Snowflake is able to maintain a throughput of 98 FPS on AlexNet while averaging 1.2 GB/s of memory bandwidth.","PeriodicalId":377872,"journal":{"name":"2018 1st Workshop on Energy Efficient Machine Learning and Cognitive Computing for Embedded Applications (EMC2)","volume":"115 ","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 1st Workshop on Energy Efficient Machine Learning and Cognitive Computing for Embedded Applications (EMC2)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/EMC2.2018.00010","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Deep Neural Networks (DNNs) are the current state of the art for various tasks such as object detection, natural language processing and semantic segmentation. These networks are massively parallel, hierarchical models with each level of hierarchy performing millions of operations on a single input. The enormous amount of parallel computation makes these DNNs suitable for custom acceleration. Custom accelerators can provide real time inference of DNNs at low power thus enabling widespread embedded deployment. In this paper, we present Snowflake, a high efficiency, low power accelerator for DNNs. Snowflake was designed to achieve optimum occupancy at low bandwidths and it is agnostic to the network architecture. Snowflake was implemented on the Xilinx Zynq XC7Z045 APSoC and achieves a peak performance of 128 G-ops/s. Snowflake is able to maintain a throughput of 98 FPS on AlexNet while averaging 1.2 GB/s of memory bandwidth.