基于片上存储器重构的BLSTM系统级高精度FPGA加速器

D. Diamantopoulos, C. Hagleitner
{"title":"基于片上存储器重构的BLSTM系统级高精度FPGA加速器","authors":"D. Diamantopoulos, C. Hagleitner","doi":"10.1109/FPT.2018.00068","DOIUrl":null,"url":null,"abstract":"The large amount of processing and storage of modern neural networks challenges engineers to architect dedicated and tailored hardware with high energy efficiency. At the inflection point of choosing among the most appropriate acceleration platform, FPGAs offer a competitive advantage with their irregular parallelism and bit-level re-programmability, at the cost of development effort. One critical problem is the lack of a common development flow between CPU and FPGA that combines advantages of both software and hardware world, i.e. integrated programmability and adaptable acceleration. This work presents a system-level FPGA implementation framework for BLSTM-based neural networks acceleration that introduces a) flexible reduced-precision (transprecision) data-paths and b) on-chip memory reshaping for storing model parameters. By evaluating the proposed architecture to an OCR application, it was possible to decrease the energy-to-solution by 21.9x and 2.6x compared to that of a POWER8 processor and a P100 GPU, respectively.","PeriodicalId":434541,"journal":{"name":"2018 International Conference on Field-Programmable Technology (FPT)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"A System-Level Transprecision FPGA Accelerator for BLSTM Using On-chip Memory Reshaping\",\"authors\":\"D. Diamantopoulos, C. Hagleitner\",\"doi\":\"10.1109/FPT.2018.00068\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The large amount of processing and storage of modern neural networks challenges engineers to architect dedicated and tailored hardware with high energy efficiency. At the inflection point of choosing among the most appropriate acceleration platform, FPGAs offer a competitive advantage with their irregular parallelism and bit-level re-programmability, at the cost of development effort. One critical problem is the lack of a common development flow between CPU and FPGA that combines advantages of both software and hardware world, i.e. integrated programmability and adaptable acceleration. This work presents a system-level FPGA implementation framework for BLSTM-based neural networks acceleration that introduces a) flexible reduced-precision (transprecision) data-paths and b) on-chip memory reshaping for storing model parameters. By evaluating the proposed architecture to an OCR application, it was possible to decrease the energy-to-solution by 21.9x and 2.6x compared to that of a POWER8 processor and a P100 GPU, respectively.\",\"PeriodicalId\":434541,\"journal\":{\"name\":\"2018 International Conference on Field-Programmable Technology (FPT)\",\"volume\":\"24 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 International Conference on Field-Programmable Technology (FPT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/FPT.2018.00068\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 International Conference on Field-Programmable Technology (FPT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FPT.2018.00068","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

摘要

现代神经网络的大量处理和存储向工程师提出了挑战,要求他们设计出具有高能效的专用和定制硬件。在选择最合适的加速平台的拐点上,fpga以其不规则并行性和位级可重编程性提供了竞争优势,但代价是开发工作。一个关键的问题是CPU和FPGA之间缺乏一个共同的开发流程,该流程结合了软件和硬件世界的优势,即集成可编程性和适应性加速。这项工作提出了一个系统级FPGA实现框架,用于基于blstm的神经网络加速,该框架引入了a)灵活的降低精度(透明)数据路径和b)用于存储模型参数的片上存储器重塑。通过对OCR应用程序的拟议架构进行评估,与POWER8处理器和P100 GPU相比,可以将能量与解决方案的比率分别降低21.9倍和2.6倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A System-Level Transprecision FPGA Accelerator for BLSTM Using On-chip Memory Reshaping
The large amount of processing and storage of modern neural networks challenges engineers to architect dedicated and tailored hardware with high energy efficiency. At the inflection point of choosing among the most appropriate acceleration platform, FPGAs offer a competitive advantage with their irregular parallelism and bit-level re-programmability, at the cost of development effort. One critical problem is the lack of a common development flow between CPU and FPGA that combines advantages of both software and hardware world, i.e. integrated programmability and adaptable acceleration. This work presents a system-level FPGA implementation framework for BLSTM-based neural networks acceleration that introduces a) flexible reduced-precision (transprecision) data-paths and b) on-chip memory reshaping for storing model parameters. By evaluating the proposed architecture to an OCR application, it was possible to decrease the energy-to-solution by 21.9x and 2.6x compared to that of a POWER8 processor and a P100 GPU, respectively.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信