Dongjoo Shin, Jinmook Lee, Jinsu Lee, Juhyoung Lee, H. Yoo
{"title":"An energy-efficient deep learning processor with heterogeneous multi-core architecture for convolutional neural networks and recurrent neural networks","authors":"Dongjoo Shin, Jinmook Lee, Jinsu Lee, Juhyoung Lee, H. Yoo","doi":"10.1109/CoolChips.2017.7946376","DOIUrl":null,"url":null,"abstract":"An energy-efficient deep learning processor is proposed for convolutional neural networks (CNNs) and recurrent neural networks (RNNs) in mobile platforms. The 16mm2 chip is fabricated using 65nm technology with 3 key features, 1) Reconfigurable heterogeneous architecture to support both CNNs and RNNs, 2) LUT-based reconfigurable multiplier optimized for dynamic fixed-point with the on-line adaptation, 3) Quantization table-based matrix multiplication to reduce off-chip memory access and remove duplicated multiplications. As a result, compared to the [2] and [3], this work shows 20× and 4.5× higher energy efficiency, respectively. Also, DNPU shows 6.5× higher energy efficiency compared to the [5].","PeriodicalId":439955,"journal":{"name":"2017 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CoolChips.2017.7946376","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10
Abstract
An energy-efficient deep learning processor is proposed for convolutional neural networks (CNNs) and recurrent neural networks (RNNs) in mobile platforms. The 16mm2 chip is fabricated using 65nm technology with 3 key features, 1) Reconfigurable heterogeneous architecture to support both CNNs and RNNs, 2) LUT-based reconfigurable multiplier optimized for dynamic fixed-point with the on-line adaptation, 3) Quantization table-based matrix multiplication to reduce off-chip memory access and remove duplicated multiplications. As a result, compared to the [2] and [3], this work shows 20× and 4.5× higher energy efficiency, respectively. Also, DNPU shows 6.5× higher energy efficiency compared to the [5].