A High Energy-Efficiency FPGA-Based LSTM Accelerator Architecture Design by Structured Pruning and Normalized Linear Quantization

Yong Zheng, Haigang Yang, Zhihong Huang, Tianli Li, Yiping Jia
{"title":"A High Energy-Efficiency FPGA-Based LSTM Accelerator Architecture Design by Structured Pruning and Normalized Linear Quantization","authors":"Yong Zheng, Haigang Yang, Zhihong Huang, Tianli Li, Yiping Jia","doi":"10.1109/ICFPT47387.2019.00045","DOIUrl":null,"url":null,"abstract":"LSTM (Long Short-Term Memory) is an artificial recurrent neural network (RNN) architecture and has been successfully applied to the areas where sequences of data need to be dealt with such as Natural Language Processing (NLP), speech recognition, etc. In this work, we explore an avenue to minimization of the LSTM inference part design based on FPGA for high performance and energy-efficiency. First, the model is pruned to create structured sparse features for the hardware-friendly purpose by using permuted block diagonal mask matrices. To further compress the model, we quantize the weights and activations following a normalized linear quantization approach. As a result, computational activities of the network are significantly deducted with an egligible loss on accuracy. Then a hardware architecture design has been devised to fully exploit the benefits of regular sparse structure. Having been implemented on Arria 10 (10AX115U4F45I3SG) FPGA running at 150 MHz, our accelerator demonstrates a peak performance of 2.22 TOPS at a power dissipation of 1.679 Watts. In comparison to the other FPGA-based LSTM accelerator designs previously reported, our approach achieves a 1.17-2.16x speedup in processing.","PeriodicalId":241340,"journal":{"name":"2019 International Conference on Field-Programmable Technology (ICFPT)","volume":"122 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on Field-Programmable Technology (ICFPT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICFPT47387.2019.00045","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

LSTM (Long Short-Term Memory) is an artificial recurrent neural network (RNN) architecture and has been successfully applied to the areas where sequences of data need to be dealt with such as Natural Language Processing (NLP), speech recognition, etc. In this work, we explore an avenue to minimization of the LSTM inference part design based on FPGA for high performance and energy-efficiency. First, the model is pruned to create structured sparse features for the hardware-friendly purpose by using permuted block diagonal mask matrices. To further compress the model, we quantize the weights and activations following a normalized linear quantization approach. As a result, computational activities of the network are significantly deducted with an egligible loss on accuracy. Then a hardware architecture design has been devised to fully exploit the benefits of regular sparse structure. Having been implemented on Arria 10 (10AX115U4F45I3SG) FPGA running at 150 MHz, our accelerator demonstrates a peak performance of 2.22 TOPS at a power dissipation of 1.679 Watts. In comparison to the other FPGA-based LSTM accelerator designs previously reported, our approach achieves a 1.17-2.16x speedup in processing.
基于结构化剪枝和归一化线性量化的高效fpga LSTM加速器结构设计
LSTM (Long - Short-Term Memory)是一种人工递归神经网络(RNN)架构,已成功应用于自然语言处理(NLP)、语音识别等需要处理序列数据的领域。在这项工作中,我们探索了一种基于FPGA的最小化LSTM推理部分设计的途径,以实现高性能和高能效。首先,通过使用排列块对角掩模矩阵,对模型进行修剪以创建结构化的稀疏特征,从而实现硬件友好的目的。为了进一步压缩模型,我们采用规范化线性量化方法对权重和激活进行量化。因此,网络的计算活动被大大扣除,而准确性的损失可以忽略不计。然后设计了一个硬件架构设计,以充分利用规则稀疏结构的优点。在运行在150 MHz的Arria 10 (10AX115U4F45I3SG) FPGA上实现后,我们的加速器在1.679瓦的功耗下显示出2.22 TOPS的峰值性能。与之前报道的其他基于fpga的LSTM加速器设计相比,我们的方法在处理上实现了1.17-2.16倍的加速。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信