{"title":"通过基于fpga的批处理实现高效的深度神经网络加速","authors":"Thorbjörn Posewsky, Daniel Ziener","doi":"10.1109/ReConFig.2016.7857167","DOIUrl":null,"url":null,"abstract":"Deep neural networks are an extremely successful and widely used technique for various pattern recognition and machine learning tasks. Due to power and resource constraints, these computationally intensive networks are difficult to implement in embedded systems. Yet, the number of applications that can benefit from the mentioned possibilities is rapidly rising. In this paper, we propose a novel architecture for processing previously learned and arbitrary deep neural networks on FPGA-based SoCs that is able to overcome these limitations. A key contribution of our approach, which we refer to as batch processing, achieves a mitigation of required weight matrix transfers from external memory by reusing weights across multiple input samples. This technique combined with a sophisticated pipelining and the usage of high performance interfaces accelerates the data processing compared to existing approaches on the same FPGA device by one order of magnitude. Furthermore, we achieve a comparable data throughput as a fully featured x86-based system at only a fraction of its energy consumption.","PeriodicalId":431909,"journal":{"name":"2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig)","volume":"76 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Efficient deep neural network acceleration through FPGA-based batch processing\",\"authors\":\"Thorbjörn Posewsky, Daniel Ziener\",\"doi\":\"10.1109/ReConFig.2016.7857167\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep neural networks are an extremely successful and widely used technique for various pattern recognition and machine learning tasks. Due to power and resource constraints, these computationally intensive networks are difficult to implement in embedded systems. Yet, the number of applications that can benefit from the mentioned possibilities is rapidly rising. In this paper, we propose a novel architecture for processing previously learned and arbitrary deep neural networks on FPGA-based SoCs that is able to overcome these limitations. A key contribution of our approach, which we refer to as batch processing, achieves a mitigation of required weight matrix transfers from external memory by reusing weights across multiple input samples. This technique combined with a sophisticated pipelining and the usage of high performance interfaces accelerates the data processing compared to existing approaches on the same FPGA device by one order of magnitude. Furthermore, we achieve a comparable data throughput as a fully featured x86-based system at only a fraction of its energy consumption.\",\"PeriodicalId\":431909,\"journal\":{\"name\":\"2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig)\",\"volume\":\"76 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ReConFig.2016.7857167\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ReConFig.2016.7857167","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Efficient deep neural network acceleration through FPGA-based batch processing
Deep neural networks are an extremely successful and widely used technique for various pattern recognition and machine learning tasks. Due to power and resource constraints, these computationally intensive networks are difficult to implement in embedded systems. Yet, the number of applications that can benefit from the mentioned possibilities is rapidly rising. In this paper, we propose a novel architecture for processing previously learned and arbitrary deep neural networks on FPGA-based SoCs that is able to overcome these limitations. A key contribution of our approach, which we refer to as batch processing, achieves a mitigation of required weight matrix transfers from external memory by reusing weights across multiple input samples. This technique combined with a sophisticated pipelining and the usage of high performance interfaces accelerates the data processing compared to existing approaches on the same FPGA device by one order of magnitude. Furthermore, we achieve a comparable data throughput as a fully featured x86-based system at only a fraction of its energy consumption.