基于Caffe框架的fpga LSTM加速引擎

Junhua He, Dazhong He, Yang Yang, Jun Liu, Jie Yang, Siye Wang
{"title":"基于Caffe框架的fpga LSTM加速引擎","authors":"Junhua He, Dazhong He, Yang Yang, Jun Liu, Jie Yang, Siye Wang","doi":"10.1109/ICCC47050.2019.9064358","DOIUrl":null,"url":null,"abstract":"Recently, Long Short Term Memory (LSTM) networks have been widely used in sequence-related problem. LSTMs outperform conventional feed-forward neural networks and RNNs in many ways, since they remember patterns selectively for long durations of time. However, due to the recurrent property of LSTMs, it is hard to implement a high computing parallelism on general processors such as CPUs and GPUs. Besides, the huge energy consumption of GPU and CPU computing is a non-negligible issue for data centers. In order to solve the problems above, FPGA emerges as an ideal solution. It has the characteristics of low power and latency, which has natural advantages for the implementation of recurrent neural networks, such as LSTMs. In this paper, we propose to implement an acceleration engine for LSTM network based on FPGAs. By employing fixed-point arithmetic, systolic arrays for matrix multiplication and look up table for activate function, we optimize the LSTM on FPGA in depth. Additionally, we integrate the acceleration engine into Caffe, one of the most popular deep learning framework, to make it easier to deploy. According to the experimental results, our acceleration engine achieves 8.8X and 2.2X gains for performance, 16.9X and 9.6X gains for energy efficiency compared with CPU and GPU, respectively.","PeriodicalId":6739,"journal":{"name":"2019 IEEE 5th International Conference on Computer and Communications (ICCC)","volume":"49 1","pages":"1286-1292"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"An LSTM Acceleration Engine for FPGAs Based on Caffe Framework\",\"authors\":\"Junhua He, Dazhong He, Yang Yang, Jun Liu, Jie Yang, Siye Wang\",\"doi\":\"10.1109/ICCC47050.2019.9064358\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recently, Long Short Term Memory (LSTM) networks have been widely used in sequence-related problem. LSTMs outperform conventional feed-forward neural networks and RNNs in many ways, since they remember patterns selectively for long durations of time. However, due to the recurrent property of LSTMs, it is hard to implement a high computing parallelism on general processors such as CPUs and GPUs. Besides, the huge energy consumption of GPU and CPU computing is a non-negligible issue for data centers. In order to solve the problems above, FPGA emerges as an ideal solution. It has the characteristics of low power and latency, which has natural advantages for the implementation of recurrent neural networks, such as LSTMs. In this paper, we propose to implement an acceleration engine for LSTM network based on FPGAs. By employing fixed-point arithmetic, systolic arrays for matrix multiplication and look up table for activate function, we optimize the LSTM on FPGA in depth. Additionally, we integrate the acceleration engine into Caffe, one of the most popular deep learning framework, to make it easier to deploy. According to the experimental results, our acceleration engine achieves 8.8X and 2.2X gains for performance, 16.9X and 9.6X gains for energy efficiency compared with CPU and GPU, respectively.\",\"PeriodicalId\":6739,\"journal\":{\"name\":\"2019 IEEE 5th International Conference on Computer and Communications (ICCC)\",\"volume\":\"49 1\",\"pages\":\"1286-1292\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE 5th International Conference on Computer and Communications (ICCC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCC47050.2019.9064358\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 5th International Conference on Computer and Communications (ICCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCC47050.2019.9064358","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

近年来,长短期记忆(LSTM)网络在序列相关问题中得到了广泛应用。lstm在许多方面都优于传统的前馈神经网络和rnn,因为它们有选择地长时间记忆模式。然而,由于lstm的循环特性,很难在cpu和gpu等通用处理器上实现较高的计算并行性。此外,GPU和CPU计算的巨大能耗对数据中心来说是一个不可忽视的问题。为了解决上述问题,FPGA成为一种理想的解决方案。它具有低功耗和延时的特点,对于实现递归神经网络(如lstm)具有天然的优势。本文提出了一种基于fpga的LSTM网络加速引擎。采用定点算法、矩阵乘法的收缩数组和激活函数的查找表,在FPGA上对LSTM进行了深入的优化。此外,我们将加速引擎集成到Caffe中,这是最流行的深度学习框架之一,使其更容易部署。实验结果表明,与CPU和GPU相比,我们的加速引擎的性能分别提高8.8倍和2.2倍,能效分别提高16.9倍和9.6倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
An LSTM Acceleration Engine for FPGAs Based on Caffe Framework
Recently, Long Short Term Memory (LSTM) networks have been widely used in sequence-related problem. LSTMs outperform conventional feed-forward neural networks and RNNs in many ways, since they remember patterns selectively for long durations of time. However, due to the recurrent property of LSTMs, it is hard to implement a high computing parallelism on general processors such as CPUs and GPUs. Besides, the huge energy consumption of GPU and CPU computing is a non-negligible issue for data centers. In order to solve the problems above, FPGA emerges as an ideal solution. It has the characteristics of low power and latency, which has natural advantages for the implementation of recurrent neural networks, such as LSTMs. In this paper, we propose to implement an acceleration engine for LSTM network based on FPGAs. By employing fixed-point arithmetic, systolic arrays for matrix multiplication and look up table for activate function, we optimize the LSTM on FPGA in depth. Additionally, we integrate the acceleration engine into Caffe, one of the most popular deep learning framework, to make it easier to deploy. According to the experimental results, our acceleration engine achieves 8.8X and 2.2X gains for performance, 16.9X and 9.6X gains for energy efficiency compared with CPU and GPU, respectively.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信