{"title":"A 1.32 TOPS/W Energy Efficient Deep Neural Network Learning Processor with Direct Feedback Alignment based Heterogeneous Core Architecture","authors":"Donghyeon Han, Jinsu Lee, Jinmook Lee, H. Yoo","doi":"10.23919/VLSIC.2019.8778006","DOIUrl":null,"url":null,"abstract":"An energy efficient deep neural network (DNN) learning processor is proposed using direct feedback alignment (DFA). The proposed processor achieves $2.2 \\times$ faster learning speed compared with the previous learning processors by the pipelined DFA (PDFA). In order to enhance the energy efficiency by 38.7%, the heterogeneous learning core (LC) architecture is optimized with the 11-stage pipeline data-path. Furthermore, direct error propagation core (DEPC) utilizes random number generators (RNG) to remove external memory access (EMA) caused by error propagation (EP) and improve the energy efficiency by 19.9%. The proposed PDFA based learning processor is evaluated on the object tracking (OT) application, and as a result, it shows 34.4 frames-per-second (FPS) throughput with 1.32 TOPS/W energy efficiency.","PeriodicalId":6707,"journal":{"name":"2019 Symposium on VLSI Circuits","volume":"19 1","pages":"C304-C305"},"PeriodicalIF":0.0000,"publicationDate":"2019-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"24","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 Symposium on VLSI Circuits","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/VLSIC.2019.8778006","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 24
Abstract
An energy efficient deep neural network (DNN) learning processor is proposed using direct feedback alignment (DFA). The proposed processor achieves $2.2 \times$ faster learning speed compared with the previous learning processors by the pipelined DFA (PDFA). In order to enhance the energy efficiency by 38.7%, the heterogeneous learning core (LC) architecture is optimized with the 11-stage pipeline data-path. Furthermore, direct error propagation core (DEPC) utilizes random number generators (RNG) to remove external memory access (EMA) caused by error propagation (EP) and improve the energy efficiency by 19.9%. The proposed PDFA based learning processor is evaluated on the object tracking (OT) application, and as a result, it shows 34.4 frames-per-second (FPS) throughput with 1.32 TOPS/W energy efficiency.