Memory Optimization Techniques for FPGA based CNN Implementations

2018 IEEE 13th Dallas Circuits and Systems Conference (DCAS) Pub Date : 2018-11-01 DOI:10.1109/DCAS.2018.8620112

Masoud Shahshahani, Pingakshya Goswami, D. Bhatia

{"title":"Memory Optimization Techniques for FPGA based CNN Implementations","authors":"Masoud Shahshahani, Pingakshya Goswami, D. Bhatia","doi":"10.1109/DCAS.2018.8620112","DOIUrl":null,"url":null,"abstract":"Deep Learning has played an important role in the classification of images, speech recognition, and natural language processing. Traditionally, these learning algorithms are implemented in clusters of CPUs and GPUs. But with the increase in data size, the models created on CPUs and GPUs are not scalable. Hence we need a hardware model which can be scaled beyond current data and model sizes. This is where FPGA comes into place. With the advancement of CAD tools for FPGAs, the designers do not need to create the architectures of the networks in RTL level using HDLs like Verilog and VHDL. They can use High-level Language like C or C++ to build the models using tools like Xilinx Vivado HLS. Also, the power consumption of FPGA based models for deep learning is substantially low as compared to GPUs. In this paper, we have done an extensive survey of various implementations of FPGA based deep learning architectures with emphasis on Convolutional Neural Networks (CNN). The CNN architectures presented in the literature consume large memory for the storage of weights and images. It is not possible to store this information in the internal FPGA Block RAM. This paper presents comprehensive servery of the methods and techniques used in literatures to tackle the memory consumption issue and how the data movement between high storage external DDR memory and internal BRAM can be reduced.","PeriodicalId":320317,"journal":{"name":"2018 IEEE 13th Dallas Circuits and Systems Conference (DCAS)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE 13th Dallas Circuits and Systems Conference (DCAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DCAS.2018.8620112","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

Abstract

Deep Learning has played an important role in the classification of images, speech recognition, and natural language processing. Traditionally, these learning algorithms are implemented in clusters of CPUs and GPUs. But with the increase in data size, the models created on CPUs and GPUs are not scalable. Hence we need a hardware model which can be scaled beyond current data and model sizes. This is where FPGA comes into place. With the advancement of CAD tools for FPGAs, the designers do not need to create the architectures of the networks in RTL level using HDLs like Verilog and VHDL. They can use High-level Language like C or C++ to build the models using tools like Xilinx Vivado HLS. Also, the power consumption of FPGA based models for deep learning is substantially low as compared to GPUs. In this paper, we have done an extensive survey of various implementations of FPGA based deep learning architectures with emphasis on Convolutional Neural Networks (CNN). The CNN architectures presented in the literature consume large memory for the storage of weights and images. It is not possible to store this information in the internal FPGA Block RAM. This paper presents comprehensive servery of the methods and techniques used in literatures to tackle the memory consumption issue and how the data movement between high storage external DDR memory and internal BRAM can be reduced.

查看原文本刊更多论文

基于FPGA的CNN实现内存优化技术

深度学习在图像分类、语音识别和自然语言处理方面发挥了重要作用。传统上，这些学习算法是在cpu和gpu集群中实现的。但是随着数据大小的增加，在cpu和gpu上创建的模型是不可扩展的。因此，我们需要一个硬件模型，它可以扩展到超出当前数据和模型大小的范围。这就是FPGA发挥作用的地方。随着fpga CAD工具的进步，设计人员不需要使用Verilog和VHDL等hdl来创建RTL级别的网络体系结构。他们可以使用C或c++等高级语言，使用Xilinx Vivado HLS等工具来构建模型。此外，与gpu相比，基于FPGA的深度学习模型的功耗要低得多。在本文中，我们对基于FPGA的深度学习架构的各种实现进行了广泛的调查，重点是卷积神经网络(CNN)。文献中提出的CNN架构消耗大量内存用于存储权重和图像。不可能在内部FPGA块RAM中存储此信息。本文全面介绍了文献中用于解决内存消耗问题的方法和技术，以及如何减少高存储外部DDR存储器和内部BRAM之间的数据移动。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2018 IEEE 13th Dallas Circuits and Systems Conference (DCAS)

自引率

0.00%

发文量