Energy-Efficient Deep Neural Networks Implementation on a Scalable Heterogeneous FPGA Cluster

Yanbu Hu, C. Shao, Huiyun Li
{"title":"Energy-Efficient Deep Neural Networks Implementation on a Scalable Heterogeneous FPGA Cluster","authors":"Yanbu Hu, C. Shao, Huiyun Li","doi":"10.1109/asid52932.2021.9651719","DOIUrl":null,"url":null,"abstract":"In recent years, with the rapid development of DNN, the algorithm complexity in a series of fields such as computer vision and natural language processing is increasing rapidly. FPGA-based DNN accelerators have demonstrated superior flexibility and performance, with higher energy efficiency compared to high-performance devices such as GPU. However, the computing resources of a single FPGA are limited and it is difficult to flexibly meet the requirements of high throughput and high energy efficiency of different computing scales. Therefore, this paper proposes a DNN implementation method based on the scalable heterogeneous FPGA cluster to adapt to different tasks and achieve high throughput and energy efficiency. Firstly, the method divides a single enormous task into multiple modules and running each module on different FPGA as the pipeline structure between multiple boards. Secondly, a task deployment method based on dichotomy is proposed to maximize the balance of task execution time of different pipeline stages to improve throughput and energy efficiency. Thirdly, optimize DNN computing module according to the relationship between computing power and bandwidth, and improve energy efficiency by reducing waste of ineffective resources and improving resource utilization. The experiment results on Alexnet and VGG-16 demonstrate that we use Zynq 7035 cluster can at most achieves ×25.23 energy efficiency of optimized AMD AIO processor. Compared with previous works of single FPGA and FPGA cluster, the energy efficiency is improved by 59.5% and 18.8%, respectively.","PeriodicalId":150884,"journal":{"name":"2021 IEEE 15th International Conference on Anti-counterfeiting, Security, and Identification (ASID)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2021-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 15th International Conference on Anti-counterfeiting, Security, and Identification (ASID)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/asid52932.2021.9651719","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

In recent years, with the rapid development of DNN, the algorithm complexity in a series of fields such as computer vision and natural language processing is increasing rapidly. FPGA-based DNN accelerators have demonstrated superior flexibility and performance, with higher energy efficiency compared to high-performance devices such as GPU. However, the computing resources of a single FPGA are limited and it is difficult to flexibly meet the requirements of high throughput and high energy efficiency of different computing scales. Therefore, this paper proposes a DNN implementation method based on the scalable heterogeneous FPGA cluster to adapt to different tasks and achieve high throughput and energy efficiency. Firstly, the method divides a single enormous task into multiple modules and running each module on different FPGA as the pipeline structure between multiple boards. Secondly, a task deployment method based on dichotomy is proposed to maximize the balance of task execution time of different pipeline stages to improve throughput and energy efficiency. Thirdly, optimize DNN computing module according to the relationship between computing power and bandwidth, and improve energy efficiency by reducing waste of ineffective resources and improving resource utilization. The experiment results on Alexnet and VGG-16 demonstrate that we use Zynq 7035 cluster can at most achieves ×25.23 energy efficiency of optimized AMD AIO processor. Compared with previous works of single FPGA and FPGA cluster, the energy efficiency is improved by 59.5% and 18.8%, respectively.
基于可扩展异构FPGA集群的节能深度神经网络实现
近年来,随着深度神经网络的快速发展,计算机视觉、自然语言处理等一系列领域的算法复杂度迅速增加。与GPU等高性能器件相比,基于fpga的DNN加速器具有更高的能效,具有优越的灵活性和性能。然而,单个FPGA的计算资源有限,难以灵活满足不同计算规模的高吞吐量和高能效要求。因此,本文提出了一种基于可扩展异构FPGA集群的深度神经网络实现方法,以适应不同的任务,实现高吞吐量和高能效。首先,该方法将单个庞大的任务划分为多个模块,并将每个模块作为多板之间的流水线结构在不同的FPGA上运行。其次,提出了一种基于二分法的任务部署方法,最大限度地平衡不同管道阶段的任务执行时间,以提高吞吐量和能源效率;第三,根据计算能力与带宽的关系对DNN计算模块进行优化,通过减少无效资源的浪费,提高资源利用率来提高能源效率。在Alexnet和VGG-16上的实验结果表明,我们使用Zynq 7035集群最多可以达到×25.23优化后的AMD AIO处理器的能效。与以往单FPGA和FPGA集群的工作相比,能效分别提高59.5%和18.8%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信