通过基于fpga的批处理实现高效的深度神经网络加速

2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig) Pub Date : 2016-11-01 DOI:10.1109/ReConFig.2016.7857167

Thorbjörn Posewsky, Daniel Ziener

{"title":"通过基于fpga的批处理实现高效的深度神经网络加速","authors":"Thorbjörn Posewsky, Daniel Ziener","doi":"10.1109/ReConFig.2016.7857167","DOIUrl":null,"url":null,"abstract":"Deep neural networks are an extremely successful and widely used technique for various pattern recognition and machine learning tasks. Due to power and resource constraints, these computationally intensive networks are difficult to implement in embedded systems. Yet, the number of applications that can benefit from the mentioned possibilities is rapidly rising. In this paper, we propose a novel architecture for processing previously learned and arbitrary deep neural networks on FPGA-based SoCs that is able to overcome these limitations. A key contribution of our approach, which we refer to as batch processing, achieves a mitigation of required weight matrix transfers from external memory by reusing weights across multiple input samples. This technique combined with a sophisticated pipelining and the usage of high performance interfaces accelerates the data processing compared to existing approaches on the same FPGA device by one order of magnitude. Furthermore, we achieve a comparable data throughput as a fully featured x86-based system at only a fraction of its energy consumption.","PeriodicalId":431909,"journal":{"name":"2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig)","volume":"76 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Efficient deep neural network acceleration through FPGA-based batch processing\",\"authors\":\"Thorbjörn Posewsky, Daniel Ziener\",\"doi\":\"10.1109/ReConFig.2016.7857167\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep neural networks are an extremely successful and widely used technique for various pattern recognition and machine learning tasks. Due to power and resource constraints, these computationally intensive networks are difficult to implement in embedded systems. Yet, the number of applications that can benefit from the mentioned possibilities is rapidly rising. In this paper, we propose a novel architecture for processing previously learned and arbitrary deep neural networks on FPGA-based SoCs that is able to overcome these limitations. A key contribution of our approach, which we refer to as batch processing, achieves a mitigation of required weight matrix transfers from external memory by reusing weights across multiple input samples. This technique combined with a sophisticated pipelining and the usage of high performance interfaces accelerates the data processing compared to existing approaches on the same FPGA device by one order of magnitude. Furthermore, we achieve a comparable data throughput as a fully featured x86-based system at only a fraction of its energy consumption.\",\"PeriodicalId\":431909,\"journal\":{\"name\":\"2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig)\",\"volume\":\"76 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ReConFig.2016.7857167\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ReConFig.2016.7857167","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

摘要

深度神经网络是一种非常成功和广泛应用于各种模式识别和机器学习任务的技术。由于功率和资源的限制，这些计算密集型网络很难在嵌入式系统中实现。然而，可以从上述可能性中受益的应用程序数量正在迅速增加。在本文中，我们提出了一种新的架构，用于在基于fpga的soc上处理先前学习过的和任意深度神经网络，能够克服这些限制。我们的方法的一个关键贡献，我们称之为批处理，通过在多个输入样本中重用权重，实现了从外部存储器传输所需的权重矩阵的减少。该技术与复杂的流水线和高性能接口的使用相结合，与相同FPGA设备上的现有方法相比，将数据处理速度提高了一个数量级。此外，我们实现了与功能齐全的基于x86的系统相当的数据吞吐量，而能耗仅为其一小部分。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Efficient deep neural network acceleration through FPGA-based batch processing

Deep neural networks are an extremely successful and widely used technique for various pattern recognition and machine learning tasks. Due to power and resource constraints, these computationally intensive networks are difficult to implement in embedded systems. Yet, the number of applications that can benefit from the mentioned possibilities is rapidly rising. In this paper, we propose a novel architecture for processing previously learned and arbitrary deep neural networks on FPGA-based SoCs that is able to overcome these limitations. A key contribution of our approach, which we refer to as batch processing, achieves a mitigation of required weight matrix transfers from external memory by reusing weights across multiple input samples. This technique combined with a sophisticated pipelining and the usage of high performance interfaces accelerates the data processing compared to existing approaches on the same FPGA device by one order of magnitude. Furthermore, we achieve a comparable data throughput as a fully featured x86-based system at only a fraction of its energy consumption.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig)

自引率

0.00%

发文量