iFPNA: A Flexible and Efficient Deep Neural Network Accelerator with a Programmable Data Flow Engine in 28nm CMOS

Chixiao Chen, Xindi Liu, Huwan Peng, Hongwei Ding, C. R. Shi
{"title":"iFPNA: A Flexible and Efficient Deep Neural Network Accelerator with a Programmable Data Flow Engine in 28nm CMOS","authors":"Chixiao Chen, Xindi Liu, Huwan Peng, Hongwei Ding, C. R. Shi","doi":"10.1109/ESSCIRC.2018.8494327","DOIUrl":null,"url":null,"abstract":"The paper presents iFPNA, instruction-and-fabric programmable neuron array: a general-purpose deep learning accelerator that achieves both energy efficiency and flexibility. The iFPNA has a programmable data flow engine with a custom instruction set, and 16 configurable neuron slices for parallel neuron operations of different bit-widths. Convolutional neural networks of different kernel sizes are implemented by choosing data flows among input stationary, row stationary and tunnel stationary, etc. Recurrent neural networks with element-wise operations are implemented by a universal activation engine. Measurement results show that the iFPNA achieves a peak energy efficiency of 1.72 TOPS/W running at 30 MHz clock rate and 0.63 V voltage supply. The measured latency on AlexNet is 60.8 ms and on LSTM-512 is 40 ms at 125 MHz clock rate.","PeriodicalId":355210,"journal":{"name":"ESSCIRC 2018 - IEEE 44th European Solid State Circuits Conference (ESSCIRC)","volume":"128 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ESSCIRC 2018 - IEEE 44th European Solid State Circuits Conference (ESSCIRC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ESSCIRC.2018.8494327","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8

Abstract

The paper presents iFPNA, instruction-and-fabric programmable neuron array: a general-purpose deep learning accelerator that achieves both energy efficiency and flexibility. The iFPNA has a programmable data flow engine with a custom instruction set, and 16 configurable neuron slices for parallel neuron operations of different bit-widths. Convolutional neural networks of different kernel sizes are implemented by choosing data flows among input stationary, row stationary and tunnel stationary, etc. Recurrent neural networks with element-wise operations are implemented by a universal activation engine. Measurement results show that the iFPNA achieves a peak energy efficiency of 1.72 TOPS/W running at 30 MHz clock rate and 0.63 V voltage supply. The measured latency on AlexNet is 60.8 ms and on LSTM-512 is 40 ms at 125 MHz clock rate.
iFPNA:一种灵活高效的深度神经网络加速器,具有28纳米CMOS的可编程数据流引擎
iFPNA具有可编程的数据流引擎和自定义指令集,以及16个可配置的神经元切片,用于不同位宽的并行神经元操作。通过选择输入平稳、行平稳和隧道平稳等数据流,实现了不同核大小的卷积神经网络。具有元素操作的递归神经网络由通用激活引擎实现。测试结果表明,在时钟频率为30 MHz、电压为0.63 V时,iFPNA的峰值能效为1.72 TOPS/W。AlexNet的测量延迟为60.8 ms, LSTM-512在125 MHz时钟速率下为40 ms。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信