Yaqian Zhao, Xin Zhang, Xing Fang, Long Li, Xuelei Li, Zhenhua Guo, Xucheng Liu
{"title":"基于FPGA的深度残留网络加速器","authors":"Yaqian Zhao, Xin Zhang, Xing Fang, Long Li, Xuelei Li, Zhenhua Guo, Xucheng Liu","doi":"10.1109/ICACI.2019.8778613","DOIUrl":null,"url":null,"abstract":"Deep residual networks plays an important role in deep learning and is widely used for image classification due to its high recognition rate. Moreover, with the increase of amount of data in the data center and embedded systems, performance and power consumption becomes the key issue. FPGA is an excellent solution, it’s more and more promising to accelerate deep learning inference due to the low latency and low energy consumption. In this paper, we present an OpenCL-based acceleration framework on FPGA for deep residual networks, which shown excellent performance and high energy efficiency ratio. Furthermore, we proposed a new strategy to deal with fully-connected layers, and also proposed an optimization strategy for 1×1 filters. In order to valid our proposal, we evaluate our framework on Intel Arria 10 devices. Evaluation results show that the ResNet50 Network on our framework can achieve a performance of 54img/s or 1.2img/s/W, which is 47% higher than that of the state-of-the- art FPGA-based design on the same device. Moreover, it’s also a competitive result compared to NVidia’s M4 GPUs.","PeriodicalId":213368,"journal":{"name":"2019 Eleventh International Conference on Advanced Computational Intelligence (ICACI)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"A Deep Residual Networks Accelerator on FPGA\",\"authors\":\"Yaqian Zhao, Xin Zhang, Xing Fang, Long Li, Xuelei Li, Zhenhua Guo, Xucheng Liu\",\"doi\":\"10.1109/ICACI.2019.8778613\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep residual networks plays an important role in deep learning and is widely used for image classification due to its high recognition rate. Moreover, with the increase of amount of data in the data center and embedded systems, performance and power consumption becomes the key issue. FPGA is an excellent solution, it’s more and more promising to accelerate deep learning inference due to the low latency and low energy consumption. In this paper, we present an OpenCL-based acceleration framework on FPGA for deep residual networks, which shown excellent performance and high energy efficiency ratio. Furthermore, we proposed a new strategy to deal with fully-connected layers, and also proposed an optimization strategy for 1×1 filters. In order to valid our proposal, we evaluate our framework on Intel Arria 10 devices. Evaluation results show that the ResNet50 Network on our framework can achieve a performance of 54img/s or 1.2img/s/W, which is 47% higher than that of the state-of-the- art FPGA-based design on the same device. Moreover, it’s also a competitive result compared to NVidia’s M4 GPUs.\",\"PeriodicalId\":213368,\"journal\":{\"name\":\"2019 Eleventh International Conference on Advanced Computational Intelligence (ICACI)\",\"volume\":\"8 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-06-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 Eleventh International Conference on Advanced Computational Intelligence (ICACI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICACI.2019.8778613\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 Eleventh International Conference on Advanced Computational Intelligence (ICACI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICACI.2019.8778613","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Deep residual networks plays an important role in deep learning and is widely used for image classification due to its high recognition rate. Moreover, with the increase of amount of data in the data center and embedded systems, performance and power consumption becomes the key issue. FPGA is an excellent solution, it’s more and more promising to accelerate deep learning inference due to the low latency and low energy consumption. In this paper, we present an OpenCL-based acceleration framework on FPGA for deep residual networks, which shown excellent performance and high energy efficiency ratio. Furthermore, we proposed a new strategy to deal with fully-connected layers, and also proposed an optimization strategy for 1×1 filters. In order to valid our proposal, we evaluate our framework on Intel Arria 10 devices. Evaluation results show that the ResNet50 Network on our framework can achieve a performance of 54img/s or 1.2img/s/W, which is 47% higher than that of the state-of-the- art FPGA-based design on the same device. Moreover, it’s also a competitive result compared to NVidia’s M4 GPUs.