{"title":"基于软件定义fpga的深度卷积神经网络加速器(摘要)","authors":"Yankang Du, Qinrang Liu, Shuai Wei, Chen Gao","doi":"10.1145/3174243.3174983","DOIUrl":null,"url":null,"abstract":"Now, Convolutional Neural Network (CNN) has gained great popularity. Intensive computation and huge external data access amount are two challenged factors for the hardware acceleration. Besides these, the ability to deal with various CNN models is also challenged. At present, most of the proposed FPGA-based CNN accelerator either can only deal with specific CNN models or should be re-coded and re-download on the FPGA for the different CNN models. This would bring great trouble for the developers. In this paper, we designed a software-defined architecture to cope with different CNN models while keeping high throughput. The hardware can be programmed according to the requirement. Several techniques are proposed to optimize the performance of our accelerators. For the convolutional layer, we proposed the software-defined data reuse technique to ensure that all the parameters can be only loaded once during the computing phase. This will reduce large off-chip data access amount and the need for the memory and the need for the memory bandwidth. By using the sparse property of the input feature map, almost 80% weight parameters can be skipped to be loaded in the full-connected (FC) layer. Compared to the previous works, our software-defined accelerator has the highest flexibility while keeping relative high throughout. Besides this, our accelerator also has lower off-chip data access amount which has a great effect on the power consumption.","PeriodicalId":164936,"journal":{"name":"Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Software-Defined FPGA-Based Accelerator for Deep Convolutional Neural Networks: (Abstract Only)\",\"authors\":\"Yankang Du, Qinrang Liu, Shuai Wei, Chen Gao\",\"doi\":\"10.1145/3174243.3174983\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Now, Convolutional Neural Network (CNN) has gained great popularity. Intensive computation and huge external data access amount are two challenged factors for the hardware acceleration. Besides these, the ability to deal with various CNN models is also challenged. At present, most of the proposed FPGA-based CNN accelerator either can only deal with specific CNN models or should be re-coded and re-download on the FPGA for the different CNN models. This would bring great trouble for the developers. In this paper, we designed a software-defined architecture to cope with different CNN models while keeping high throughput. The hardware can be programmed according to the requirement. Several techniques are proposed to optimize the performance of our accelerators. For the convolutional layer, we proposed the software-defined data reuse technique to ensure that all the parameters can be only loaded once during the computing phase. This will reduce large off-chip data access amount and the need for the memory and the need for the memory bandwidth. By using the sparse property of the input feature map, almost 80% weight parameters can be skipped to be loaded in the full-connected (FC) layer. Compared to the previous works, our software-defined accelerator has the highest flexibility while keeping relative high throughout. Besides this, our accelerator also has lower off-chip data access amount which has a great effect on the power consumption.\",\"PeriodicalId\":164936,\"journal\":{\"name\":\"Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays\",\"volume\":\"9 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-02-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3174243.3174983\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3174243.3174983","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Software-Defined FPGA-Based Accelerator for Deep Convolutional Neural Networks: (Abstract Only)
Now, Convolutional Neural Network (CNN) has gained great popularity. Intensive computation and huge external data access amount are two challenged factors for the hardware acceleration. Besides these, the ability to deal with various CNN models is also challenged. At present, most of the proposed FPGA-based CNN accelerator either can only deal with specific CNN models or should be re-coded and re-download on the FPGA for the different CNN models. This would bring great trouble for the developers. In this paper, we designed a software-defined architecture to cope with different CNN models while keeping high throughput. The hardware can be programmed according to the requirement. Several techniques are proposed to optimize the performance of our accelerators. For the convolutional layer, we proposed the software-defined data reuse technique to ensure that all the parameters can be only loaded once during the computing phase. This will reduce large off-chip data access amount and the need for the memory and the need for the memory bandwidth. By using the sparse property of the input feature map, almost 80% weight parameters can be skipped to be loaded in the full-connected (FC) layer. Compared to the previous works, our software-defined accelerator has the highest flexibility while keeping relative high throughout. Besides this, our accelerator also has lower off-chip data access amount which has a great effect on the power consumption.