Exploring a Layer-based Pre-implemented Flow for Mapping CNN on FPGA

2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2021-06-01 DOI:10.1109/IPDPSW52791.2021.00025

Danielle Tchuinkou Kwadjo, Joel Mandebi Mbongue, C. Bobda

{"title":"Exploring a Layer-based Pre-implemented Flow for Mapping CNN on FPGA","authors":"Danielle Tchuinkou Kwadjo, Joel Mandebi Mbongue, C. Bobda","doi":"10.1109/IPDPSW52791.2021.00025","DOIUrl":null,"url":null,"abstract":"Convolutional Neural Networks are compute-intensive learning models that have demonstrated ability and effectiveness in solving complex learning problems. However, developing a high-performance FPGA accelerator for CNN often demands high programming skills, hardware verification, precise distribution localization, and long development cycles. Besides, CNN depth increases by reuse and replication of multiple layers. This paper proposes a programming flow for CNN on FPGA to generate high-performance accelerators by assembling CNN pre-implemented components as a puzzle based on the graph topology. Using pre-implemented components allows us to use the minimum of resources necessary, predict the performance, and gain in productivity since there is no need to synthesize any HDL code. Furthermore, components can be reused for a different range of applications. Through prototyping, we demonstrated the viability and relevance of our approach. Experiments show a productivity improvement of up to 69% compared to a traditional FPGA implementation while achieving over 1.75× higher Fmax with lower resources and power consumption.","PeriodicalId":170832,"journal":{"name":"2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPSW52791.2021.00025","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

Convolutional Neural Networks are compute-intensive learning models that have demonstrated ability and effectiveness in solving complex learning problems. However, developing a high-performance FPGA accelerator for CNN often demands high programming skills, hardware verification, precise distribution localization, and long development cycles. Besides, CNN depth increases by reuse and replication of multiple layers. This paper proposes a programming flow for CNN on FPGA to generate high-performance accelerators by assembling CNN pre-implemented components as a puzzle based on the graph topology. Using pre-implemented components allows us to use the minimum of resources necessary, predict the performance, and gain in productivity since there is no need to synthesize any HDL code. Furthermore, components can be reused for a different range of applications. Through prototyping, we demonstrated the viability and relevance of our approach. Experiments show a productivity improvement of up to 69% compared to a traditional FPGA implementation while achieving over 1.75× higher Fmax with lower resources and power consumption.

查看原文本刊更多论文

探索一种基于层的预实现流程在FPGA上映射CNN

卷积神经网络是计算密集型的学习模型，在解决复杂的学习问题方面已经证明了它的能力和有效性。然而，为CNN开发高性能FPGA加速器往往需要较高的编程技能、硬件验证、精确的分布定位和较长的开发周期。此外，多层的重复使用和复制增加了CNN的深度。本文提出了一种基于FPGA的CNN编程流程，通过将CNN预实现组件组装成基于图拓扑的谜题来生成高性能加速器。使用预实现的组件使我们能够使用最少的必要资源，预测性能，并获得生产力，因为不需要合成任何HDL代码。此外，组件可以在不同范围的应用程序中重用。通过原型，我们展示了我们的方法的可行性和相关性。实验表明，与传统FPGA实现相比，生产率提高高达69%，同时以更低的资源和功耗实现超过1.75倍的Fmax。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

自引率

0.00%

发文量