An FPGA-based accelerator platform implements for convolutional neural network

Proceedings of the 3rd International Conference on High Performance Compilation, Computing and Communications Pub Date : 2019-03-08 DOI:10.1145/3318265.3318285

Xiao Meng, Lixin Yu, Zhiyong Qin

{"title":"An FPGA-based accelerator platform implements for convolutional neural network","authors":"Xiao Meng, Lixin Yu, Zhiyong Qin","doi":"10.1145/3318265.3318285","DOIUrl":null,"url":null,"abstract":"In recent years, convolutional neural network (CNN) has become widely universal in large number of applications including computer vision, natural language processing and automatic driving. However, the CNN-based methods are computational-intensive and resource-intensive, and thus are hard to integrate the neural network into embedded systems such as smart phones, automatic driving and robots. To address the limitation, various deep learning accelerators have been proposed to implement on the field programmable gate array (FPGA) platform, because of its flexibility and reconfigurability. In this paper, we design and implement an FPGA-based accelerator platform which integrated the NVIDIA deep learning accelerator (NVDLA). We illustrate the detail architecture of the accelerator, and give the software and hardware co-design approaches which can instruct the system designs of FPGA-based accelerator platform. As a case study, we implement the CNN accelerator on an XCZU9EG FPGA platform and our implement achieves a peak performance of 25.6 GOPS when computing the valid output of convolutional layers under 100 MHz working frequency.","PeriodicalId":241692,"journal":{"name":"Proceedings of the 3rd International Conference on High Performance Compilation, Computing and Communications","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 3rd International Conference on High Performance Compilation, Computing and Communications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3318265.3318285","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

In recent years, convolutional neural network (CNN) has become widely universal in large number of applications including computer vision, natural language processing and automatic driving. However, the CNN-based methods are computational-intensive and resource-intensive, and thus are hard to integrate the neural network into embedded systems such as smart phones, automatic driving and robots. To address the limitation, various deep learning accelerators have been proposed to implement on the field programmable gate array (FPGA) platform, because of its flexibility and reconfigurability. In this paper, we design and implement an FPGA-based accelerator platform which integrated the NVIDIA deep learning accelerator (NVDLA). We illustrate the detail architecture of the accelerator, and give the software and hardware co-design approaches which can instruct the system designs of FPGA-based accelerator platform. As a case study, we implement the CNN accelerator on an XCZU9EG FPGA platform and our implement achieves a peak performance of 25.6 GOPS when computing the valid output of convolutional layers under 100 MHz working frequency.

查看原文本刊更多论文

基于fpga的卷积神经网络加速平台的实现

近年来，卷积神经网络(CNN)在计算机视觉、自然语言处理、自动驾驶等大量应用中得到了广泛应用。然而，基于cnn的方法是计算密集型和资源密集型的，因此很难将神经网络集成到智能手机、自动驾驶和机器人等嵌入式系统中。给出了加速器的详细结构，并给出了软硬件协同设计方法，对基于fpga的加速器平台的系统设计具有指导意义。作为案例研究，我们在XCZU9EG FPGA平台上实现了CNN加速器，当计算100 MHz工作频率下卷积层的有效输出时，我们的实现达到了25.6 GOPS的峰值性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 3rd International Conference on High Performance Compilation, Computing and Communications

自引率

0.00%

发文量