An LSTM Acceleration Engine for FPGAs Based on Caffe Framework

2019 IEEE 5th International Conference on Computer and Communications (ICCC) Pub Date : 2019-12-01 DOI:10.1109/ICCC47050.2019.9064358

Junhua He, Dazhong He, Yang Yang, Jun Liu, Jie Yang, Siye Wang

{"title":"An LSTM Acceleration Engine for FPGAs Based on Caffe Framework","authors":"Junhua He, Dazhong He, Yang Yang, Jun Liu, Jie Yang, Siye Wang","doi":"10.1109/ICCC47050.2019.9064358","DOIUrl":null,"url":null,"abstract":"Recently, Long Short Term Memory (LSTM) networks have been widely used in sequence-related problem. LSTMs outperform conventional feed-forward neural networks and RNNs in many ways, since they remember patterns selectively for long durations of time. However, due to the recurrent property of LSTMs, it is hard to implement a high computing parallelism on general processors such as CPUs and GPUs. Besides, the huge energy consumption of GPU and CPU computing is a non-negligible issue for data centers. In order to solve the problems above, FPGA emerges as an ideal solution. It has the characteristics of low power and latency, which has natural advantages for the implementation of recurrent neural networks, such as LSTMs. In this paper, we propose to implement an acceleration engine for LSTM network based on FPGAs. By employing fixed-point arithmetic, systolic arrays for matrix multiplication and look up table for activate function, we optimize the LSTM on FPGA in depth. Additionally, we integrate the acceleration engine into Caffe, one of the most popular deep learning framework, to make it easier to deploy. According to the experimental results, our acceleration engine achieves 8.8X and 2.2X gains for performance, 16.9X and 9.6X gains for energy efficiency compared with CPU and GPU, respectively.","PeriodicalId":6739,"journal":{"name":"2019 IEEE 5th International Conference on Computer and Communications (ICCC)","volume":"49 1","pages":"1286-1292"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 5th International Conference on Computer and Communications (ICCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCC47050.2019.9064358","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Recently, Long Short Term Memory (LSTM) networks have been widely used in sequence-related problem. LSTMs outperform conventional feed-forward neural networks and RNNs in many ways, since they remember patterns selectively for long durations of time. However, due to the recurrent property of LSTMs, it is hard to implement a high computing parallelism on general processors such as CPUs and GPUs. Besides, the huge energy consumption of GPU and CPU computing is a non-negligible issue for data centers. In order to solve the problems above, FPGA emerges as an ideal solution. It has the characteristics of low power and latency, which has natural advantages for the implementation of recurrent neural networks, such as LSTMs. In this paper, we propose to implement an acceleration engine for LSTM network based on FPGAs. By employing fixed-point arithmetic, systolic arrays for matrix multiplication and look up table for activate function, we optimize the LSTM on FPGA in depth. Additionally, we integrate the acceleration engine into Caffe, one of the most popular deep learning framework, to make it easier to deploy. According to the experimental results, our acceleration engine achieves 8.8X and 2.2X gains for performance, 16.9X and 9.6X gains for energy efficiency compared with CPU and GPU, respectively.

查看原文本刊更多论文

基于Caffe框架的fpga LSTM加速引擎

近年来，长短期记忆(LSTM)网络在序列相关问题中得到了广泛应用。lstm在许多方面都优于传统的前馈神经网络和rnn，因为它们有选择地长时间记忆模式。然而，由于lstm的循环特性，很难在cpu和gpu等通用处理器上实现较高的计算并行性。此外，GPU和CPU计算的巨大能耗对数据中心来说是一个不可忽视的问题。为了解决上述问题，FPGA成为一种理想的解决方案。它具有低功耗和延时的特点，对于实现递归神经网络(如lstm)具有天然的优势。本文提出了一种基于fpga的LSTM网络加速引擎。采用定点算法、矩阵乘法的收缩数组和激活函数的查找表，在FPGA上对LSTM进行了深入的优化。此外，我们将加速引擎集成到Caffe中，这是最流行的深度学习框架之一，使其更容易部署。实验结果表明，与CPU和GPU相比，我们的加速引擎的性能分别提高8.8倍和2.2倍，能效分别提高16.9倍和9.6倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 IEEE 5th International Conference on Computer and Communications (ICCC)

自引率

0.00%

发文量