A Power Efficient Neural Network Implementation on Heterogeneous FPGA and GPU Devices

Y. Tu, Saad Sadiq, Yudong Tao, M. Shyu, Shu‐Ching Chen
{"title":"A Power Efficient Neural Network Implementation on Heterogeneous FPGA and GPU Devices","authors":"Y. Tu, Saad Sadiq, Yudong Tao, M. Shyu, Shu‐Ching Chen","doi":"10.1109/IRI.2019.00040","DOIUrl":null,"url":null,"abstract":"Deep neural networks (DNNs) have seen tremendous industrial successes in various applications, including image recognition, machine translation, audio processing, etc. However, they require massive amounts of computations and take a lot of time to process. This quickly becomes a problem in mobile and handheld devices where real-time multimedia applications such as face detection, disaster management, and CCTV require lightweight, fast, and effective computing solutions. The objective of this project is to utilize specialized devices such as Field Programmable Gate Arrays (FPGAs) and Graphics Processing Units (GPUs) in a heterogeneous computing environment to accelerate the deep learning computations with the constraints of power efficiency. We investigate an efficient DNN implementation and make use of FPGA for fully-connected layer and GPU for floating-point operations. This requires the deep neural network architecture to be implemented in a model parallelism system where the DNN model is broken down and processed in a distributed fashion. The proposed heterogeneous framework idea is implemented using an Nvidia TX2 GPU and a Xilinx Artix-7 FPGA. Experimental results indicate that the proposed framework can achieve faster computation and much lower power consumption.","PeriodicalId":295028,"journal":{"name":"2019 IEEE 20th International Conference on Information Reuse and Integration for Data Science (IRI)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 20th International Conference on Information Reuse and Integration for Data Science (IRI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IRI.2019.00040","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11

Abstract

Deep neural networks (DNNs) have seen tremendous industrial successes in various applications, including image recognition, machine translation, audio processing, etc. However, they require massive amounts of computations and take a lot of time to process. This quickly becomes a problem in mobile and handheld devices where real-time multimedia applications such as face detection, disaster management, and CCTV require lightweight, fast, and effective computing solutions. The objective of this project is to utilize specialized devices such as Field Programmable Gate Arrays (FPGAs) and Graphics Processing Units (GPUs) in a heterogeneous computing environment to accelerate the deep learning computations with the constraints of power efficiency. We investigate an efficient DNN implementation and make use of FPGA for fully-connected layer and GPU for floating-point operations. This requires the deep neural network architecture to be implemented in a model parallelism system where the DNN model is broken down and processed in a distributed fashion. The proposed heterogeneous framework idea is implemented using an Nvidia TX2 GPU and a Xilinx Artix-7 FPGA. Experimental results indicate that the proposed framework can achieve faster computation and much lower power consumption.
基于异构FPGA和GPU器件的高效节能神经网络实现
深度神经网络(dnn)在各种应用中取得了巨大的工业成功,包括图像识别,机器翻译,音频处理等。然而,它们需要大量的计算,并且需要大量的时间来处理。这很快成为移动和手持设备中的一个问题,其中实时多媒体应用程序(如人脸检测、灾难管理和CCTV)需要轻量级、快速和有效的计算解决方案。该项目的目标是在异构计算环境中利用专用设备(如现场可编程门阵列(fpga)和图形处理单元(gpu))来加速具有功率效率限制的深度学习计算。我们研究了一个有效的深度神经网络实现,并利用FPGA实现全连接层和GPU进行浮点运算。这需要在模型并行系统中实现深度神经网络架构,其中DNN模型以分布式方式分解和处理。采用Nvidia TX2 GPU和Xilinx Artix-7 FPGA实现了异构框架思想。实验结果表明,该框架可以实现更快的计算速度和更低的功耗。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信