Chixiao Chen, Xindi Liu, Huwan Peng, Hongwei Ding, C. R. Shi
{"title":"iFPNA:一种灵活高效的深度神经网络加速器,具有28纳米CMOS的可编程数据流引擎","authors":"Chixiao Chen, Xindi Liu, Huwan Peng, Hongwei Ding, C. R. Shi","doi":"10.1109/ESSCIRC.2018.8494327","DOIUrl":null,"url":null,"abstract":"The paper presents iFPNA, instruction-and-fabric programmable neuron array: a general-purpose deep learning accelerator that achieves both energy efficiency and flexibility. The iFPNA has a programmable data flow engine with a custom instruction set, and 16 configurable neuron slices for parallel neuron operations of different bit-widths. Convolutional neural networks of different kernel sizes are implemented by choosing data flows among input stationary, row stationary and tunnel stationary, etc. Recurrent neural networks with element-wise operations are implemented by a universal activation engine. Measurement results show that the iFPNA achieves a peak energy efficiency of 1.72 TOPS/W running at 30 MHz clock rate and 0.63 V voltage supply. The measured latency on AlexNet is 60.8 ms and on LSTM-512 is 40 ms at 125 MHz clock rate.","PeriodicalId":355210,"journal":{"name":"ESSCIRC 2018 - IEEE 44th European Solid State Circuits Conference (ESSCIRC)","volume":"128 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"iFPNA: A Flexible and Efficient Deep Neural Network Accelerator with a Programmable Data Flow Engine in 28nm CMOS\",\"authors\":\"Chixiao Chen, Xindi Liu, Huwan Peng, Hongwei Ding, C. R. Shi\",\"doi\":\"10.1109/ESSCIRC.2018.8494327\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The paper presents iFPNA, instruction-and-fabric programmable neuron array: a general-purpose deep learning accelerator that achieves both energy efficiency and flexibility. The iFPNA has a programmable data flow engine with a custom instruction set, and 16 configurable neuron slices for parallel neuron operations of different bit-widths. Convolutional neural networks of different kernel sizes are implemented by choosing data flows among input stationary, row stationary and tunnel stationary, etc. Recurrent neural networks with element-wise operations are implemented by a universal activation engine. Measurement results show that the iFPNA achieves a peak energy efficiency of 1.72 TOPS/W running at 30 MHz clock rate and 0.63 V voltage supply. The measured latency on AlexNet is 60.8 ms and on LSTM-512 is 40 ms at 125 MHz clock rate.\",\"PeriodicalId\":355210,\"journal\":{\"name\":\"ESSCIRC 2018 - IEEE 44th European Solid State Circuits Conference (ESSCIRC)\",\"volume\":\"128 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ESSCIRC 2018 - IEEE 44th European Solid State Circuits Conference (ESSCIRC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ESSCIRC.2018.8494327\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ESSCIRC 2018 - IEEE 44th European Solid State Circuits Conference (ESSCIRC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ESSCIRC.2018.8494327","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
iFPNA: A Flexible and Efficient Deep Neural Network Accelerator with a Programmable Data Flow Engine in 28nm CMOS
The paper presents iFPNA, instruction-and-fabric programmable neuron array: a general-purpose deep learning accelerator that achieves both energy efficiency and flexibility. The iFPNA has a programmable data flow engine with a custom instruction set, and 16 configurable neuron slices for parallel neuron operations of different bit-widths. Convolutional neural networks of different kernel sizes are implemented by choosing data flows among input stationary, row stationary and tunnel stationary, etc. Recurrent neural networks with element-wise operations are implemented by a universal activation engine. Measurement results show that the iFPNA achieves a peak energy efficiency of 1.72 TOPS/W running at 30 MHz clock rate and 0.63 V voltage supply. The measured latency on AlexNet is 60.8 ms and on LSTM-512 is 40 ms at 125 MHz clock rate.