{"title":"基于opencl的FPGA混合CNN-RNN推理加速器","authors":"Yunfei Sun, Brian Liu, Xianchao Xu","doi":"10.1109/ICFPT47387.2019.00048","DOIUrl":null,"url":null,"abstract":"Recently, Convolution Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and CNN-RNN hybrid networks have demonstrated great success in many deep learning scenarios. Although many dedicated FPGA accelerators for a certain kind of network have been proposed, few of them combine CNN and RNN acceleration together. In this paper we propose a high-throughput and resource-efficient CNN-RNN fusion accelerator on FPGA with commercial OpenCL to support general-purpose DNNs. It utilizes a novel streaming architecture and mapping strategy to implement the most computationintensive and resource-demanding parts in DNNs on the same computation logic. By such a hardware reuse method, it realizes resource efficiency in accelerating CNNs, RNNs and their hybrid networks. Our accelerator follows a layer-by-layer, subgraph-by-subgraph or subnetwork-by-subnetwork execution mode, which facilities it to deploy most DNNs flexibly during runtime with best performance. YOLOv2, LSTM and CRNN are tested with our work on Intel Arria10 GX1150 FPGA. It achieves 646 GOPS throughput on CRNN, which is the best performance on CNNRNN hybrid networks among high-level-synthesis (HLS) based FPGA accelerators. Moreover, its throughput for CNNs and RNNs is competitive to the state-of-the-art specialized FPGA accelerators.","PeriodicalId":241340,"journal":{"name":"2019 International Conference on Field-Programmable Technology (ICFPT)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":"{\"title\":\"An OpenCL-Based Hybrid CNN-RNN Inference Accelerator On FPGA\",\"authors\":\"Yunfei Sun, Brian Liu, Xianchao Xu\",\"doi\":\"10.1109/ICFPT47387.2019.00048\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recently, Convolution Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and CNN-RNN hybrid networks have demonstrated great success in many deep learning scenarios. Although many dedicated FPGA accelerators for a certain kind of network have been proposed, few of them combine CNN and RNN acceleration together. In this paper we propose a high-throughput and resource-efficient CNN-RNN fusion accelerator on FPGA with commercial OpenCL to support general-purpose DNNs. It utilizes a novel streaming architecture and mapping strategy to implement the most computationintensive and resource-demanding parts in DNNs on the same computation logic. By such a hardware reuse method, it realizes resource efficiency in accelerating CNNs, RNNs and their hybrid networks. Our accelerator follows a layer-by-layer, subgraph-by-subgraph or subnetwork-by-subnetwork execution mode, which facilities it to deploy most DNNs flexibly during runtime with best performance. YOLOv2, LSTM and CRNN are tested with our work on Intel Arria10 GX1150 FPGA. It achieves 646 GOPS throughput on CRNN, which is the best performance on CNNRNN hybrid networks among high-level-synthesis (HLS) based FPGA accelerators. Moreover, its throughput for CNNs and RNNs is competitive to the state-of-the-art specialized FPGA accelerators.\",\"PeriodicalId\":241340,\"journal\":{\"name\":\"2019 International Conference on Field-Programmable Technology (ICFPT)\",\"volume\":\"22 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"10\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 International Conference on Field-Programmable Technology (ICFPT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICFPT47387.2019.00048\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on Field-Programmable Technology (ICFPT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICFPT47387.2019.00048","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An OpenCL-Based Hybrid CNN-RNN Inference Accelerator On FPGA
Recently, Convolution Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and CNN-RNN hybrid networks have demonstrated great success in many deep learning scenarios. Although many dedicated FPGA accelerators for a certain kind of network have been proposed, few of them combine CNN and RNN acceleration together. In this paper we propose a high-throughput and resource-efficient CNN-RNN fusion accelerator on FPGA with commercial OpenCL to support general-purpose DNNs. It utilizes a novel streaming architecture and mapping strategy to implement the most computationintensive and resource-demanding parts in DNNs on the same computation logic. By such a hardware reuse method, it realizes resource efficiency in accelerating CNNs, RNNs and their hybrid networks. Our accelerator follows a layer-by-layer, subgraph-by-subgraph or subnetwork-by-subnetwork execution mode, which facilities it to deploy most DNNs flexibly during runtime with best performance. YOLOv2, LSTM and CRNN are tested with our work on Intel Arria10 GX1150 FPGA. It achieves 646 GOPS throughput on CRNN, which is the best performance on CNNRNN hybrid networks among high-level-synthesis (HLS) based FPGA accelerators. Moreover, its throughput for CNNs and RNNs is competitive to the state-of-the-art specialized FPGA accelerators.