Junhua He, Dazhong He, Yang Yang, Jun Liu, Jie Yang, Siye Wang
{"title":"An LSTM Acceleration Engine for FPGAs Based on Caffe Framework","authors":"Junhua He, Dazhong He, Yang Yang, Jun Liu, Jie Yang, Siye Wang","doi":"10.1109/ICCC47050.2019.9064358","DOIUrl":null,"url":null,"abstract":"Recently, Long Short Term Memory (LSTM) networks have been widely used in sequence-related problem. LSTMs outperform conventional feed-forward neural networks and RNNs in many ways, since they remember patterns selectively for long durations of time. However, due to the recurrent property of LSTMs, it is hard to implement a high computing parallelism on general processors such as CPUs and GPUs. Besides, the huge energy consumption of GPU and CPU computing is a non-negligible issue for data centers. In order to solve the problems above, FPGA emerges as an ideal solution. It has the characteristics of low power and latency, which has natural advantages for the implementation of recurrent neural networks, such as LSTMs. In this paper, we propose to implement an acceleration engine for LSTM network based on FPGAs. By employing fixed-point arithmetic, systolic arrays for matrix multiplication and look up table for activate function, we optimize the LSTM on FPGA in depth. Additionally, we integrate the acceleration engine into Caffe, one of the most popular deep learning framework, to make it easier to deploy. According to the experimental results, our acceleration engine achieves 8.8X and 2.2X gains for performance, 16.9X and 9.6X gains for energy efficiency compared with CPU and GPU, respectively.","PeriodicalId":6739,"journal":{"name":"2019 IEEE 5th International Conference on Computer and Communications (ICCC)","volume":"49 1","pages":"1286-1292"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 5th International Conference on Computer and Communications (ICCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCC47050.2019.9064358","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Recently, Long Short Term Memory (LSTM) networks have been widely used in sequence-related problem. LSTMs outperform conventional feed-forward neural networks and RNNs in many ways, since they remember patterns selectively for long durations of time. However, due to the recurrent property of LSTMs, it is hard to implement a high computing parallelism on general processors such as CPUs and GPUs. Besides, the huge energy consumption of GPU and CPU computing is a non-negligible issue for data centers. In order to solve the problems above, FPGA emerges as an ideal solution. It has the characteristics of low power and latency, which has natural advantages for the implementation of recurrent neural networks, such as LSTMs. In this paper, we propose to implement an acceleration engine for LSTM network based on FPGAs. By employing fixed-point arithmetic, systolic arrays for matrix multiplication and look up table for activate function, we optimize the LSTM on FPGA in depth. Additionally, we integrate the acceleration engine into Caffe, one of the most popular deep learning framework, to make it easier to deploy. According to the experimental results, our acceleration engine achieves 8.8X and 2.2X gains for performance, 16.9X and 9.6X gains for energy efficiency compared with CPU and GPU, respectively.