{"title":"An FPGA-based accelerator platform implements for convolutional neural network","authors":"Xiao Meng, Lixin Yu, Zhiyong Qin","doi":"10.1145/3318265.3318285","DOIUrl":null,"url":null,"abstract":"In recent years, convolutional neural network (CNN) has become widely universal in large number of applications including computer vision, natural language processing and automatic driving. However, the CNN-based methods are computational-intensive and resource-intensive, and thus are hard to integrate the neural network into embedded systems such as smart phones, automatic driving and robots. To address the limitation, various deep learning accelerators have been proposed to implement on the field programmable gate array (FPGA) platform, because of its flexibility and reconfigurability. In this paper, we design and implement an FPGA-based accelerator platform which integrated the NVIDIA deep learning accelerator (NVDLA). We illustrate the detail architecture of the accelerator, and give the software and hardware co-design approaches which can instruct the system designs of FPGA-based accelerator platform. As a case study, we implement the CNN accelerator on an XCZU9EG FPGA platform and our implement achieves a peak performance of 25.6 GOPS when computing the valid output of convolutional layers under 100 MHz working frequency.","PeriodicalId":241692,"journal":{"name":"Proceedings of the 3rd International Conference on High Performance Compilation, Computing and Communications","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 3rd International Conference on High Performance Compilation, Computing and Communications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3318265.3318285","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In recent years, convolutional neural network (CNN) has become widely universal in large number of applications including computer vision, natural language processing and automatic driving. However, the CNN-based methods are computational-intensive and resource-intensive, and thus are hard to integrate the neural network into embedded systems such as smart phones, automatic driving and robots. To address the limitation, various deep learning accelerators have been proposed to implement on the field programmable gate array (FPGA) platform, because of its flexibility and reconfigurability. In this paper, we design and implement an FPGA-based accelerator platform which integrated the NVIDIA deep learning accelerator (NVDLA). We illustrate the detail architecture of the accelerator, and give the software and hardware co-design approaches which can instruct the system designs of FPGA-based accelerator platform. As a case study, we implement the CNN accelerator on an XCZU9EG FPGA platform and our implement achieves a peak performance of 25.6 GOPS when computing the valid output of convolutional layers under 100 MHz working frequency.