A Power Efficient Neural Network Implementation on Heterogeneous FPGA and GPU Devices

2019 IEEE 20th International Conference on Information Reuse and Integration for Data Science (IRI) Pub Date : 2019-07-01 DOI:10.1109/IRI.2019.00040

Y. Tu, Saad Sadiq, Yudong Tao, M. Shyu, Shu‐Ching Chen

{"title":"A Power Efficient Neural Network Implementation on Heterogeneous FPGA and GPU Devices","authors":"Y. Tu, Saad Sadiq, Yudong Tao, M. Shyu, Shu‐Ching Chen","doi":"10.1109/IRI.2019.00040","DOIUrl":null,"url":null,"abstract":"Deep neural networks (DNNs) have seen tremendous industrial successes in various applications, including image recognition, machine translation, audio processing, etc. However, they require massive amounts of computations and take a lot of time to process. This quickly becomes a problem in mobile and handheld devices where real-time multimedia applications such as face detection, disaster management, and CCTV require lightweight, fast, and effective computing solutions. The objective of this project is to utilize specialized devices such as Field Programmable Gate Arrays (FPGAs) and Graphics Processing Units (GPUs) in a heterogeneous computing environment to accelerate the deep learning computations with the constraints of power efficiency. We investigate an efficient DNN implementation and make use of FPGA for fully-connected layer and GPU for floating-point operations. This requires the deep neural network architecture to be implemented in a model parallelism system where the DNN model is broken down and processed in a distributed fashion. The proposed heterogeneous framework idea is implemented using an Nvidia TX2 GPU and a Xilinx Artix-7 FPGA. Experimental results indicate that the proposed framework can achieve faster computation and much lower power consumption.","PeriodicalId":295028,"journal":{"name":"2019 IEEE 20th International Conference on Information Reuse and Integration for Data Science (IRI)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 20th International Conference on Information Reuse and Integration for Data Science (IRI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IRI.2019.00040","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 11

Abstract

Deep neural networks (DNNs) have seen tremendous industrial successes in various applications, including image recognition, machine translation, audio processing, etc. However, they require massive amounts of computations and take a lot of time to process. This quickly becomes a problem in mobile and handheld devices where real-time multimedia applications such as face detection, disaster management, and CCTV require lightweight, fast, and effective computing solutions. The objective of this project is to utilize specialized devices such as Field Programmable Gate Arrays (FPGAs) and Graphics Processing Units (GPUs) in a heterogeneous computing environment to accelerate the deep learning computations with the constraints of power efficiency. We investigate an efficient DNN implementation and make use of FPGA for fully-connected layer and GPU for floating-point operations. This requires the deep neural network architecture to be implemented in a model parallelism system where the DNN model is broken down and processed in a distributed fashion. The proposed heterogeneous framework idea is implemented using an Nvidia TX2 GPU and a Xilinx Artix-7 FPGA. Experimental results indicate that the proposed framework can achieve faster computation and much lower power consumption.

查看原文本刊更多论文

基于异构FPGA和GPU器件的高效节能神经网络实现

深度神经网络(dnn)在各种应用中取得了巨大的工业成功，包括图像识别，机器翻译，音频处理等。然而，它们需要大量的计算，并且需要大量的时间来处理。这很快成为移动和手持设备中的一个问题，其中实时多媒体应用程序(如人脸检测、灾难管理和CCTV)需要轻量级、快速和有效的计算解决方案。该项目的目标是在异构计算环境中利用专用设备(如现场可编程门阵列(fpga)和图形处理单元(gpu))来加速具有功率效率限制的深度学习计算。我们研究了一个有效的深度神经网络实现，并利用FPGA实现全连接层和GPU进行浮点运算。这需要在模型并行系统中实现深度神经网络架构，其中DNN模型以分布式方式分解和处理。采用Nvidia TX2 GPU和Xilinx Artix-7 FPGA实现了异构框架思想。实验结果表明，该框架可以实现更快的计算速度和更低的功耗。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 IEEE 20th International Conference on Information Reuse and Integration for Data Science (IRI)

自引率

0.00%

发文量