在OpenCL框架下实现深度学习神经网络对象检测的异构系统

Shuai Li, Yukui Luo, K. Sun, K. Choi
{"title":"在OpenCL框架下实现深度学习神经网络对象检测的异构系统","authors":"Shuai Li, Yukui Luo, K. Sun, K. Choi","doi":"10.23919/ELINFOCOM.2018.8330645","DOIUrl":null,"url":null,"abstract":"One of the major challenges in these days is \"How can we implement up-to-date object detection algorithm in the heterogeneous system?\" As in 2012 Visual Object Classes Challenge (VOC)[1] have achieved a very satisfied performance of deep learning neural network (DNN) algorithm, but it depends on CUDA [2] GPU framework and can only be applied on NVIDIA accelerators. We prefer to use a more generic acceleration framework, OpenCL [3] is a golden key to achieve the requirement. Instead of CUDA for NVIDIA GPU only, OpenCL can be applied to the heterogeneous system including CPU, GPU, DSP, FPGA, etc. Heterogeneous systems are more flexible, some of them are designed for portable devices, and some are designed for low power parallel computation. These special devices play a very important role in modern life. In this paper, we present OpenCL based heterogeneous system implementation and apply DNN framework in two typical heterogeneous systems: portable system and FPGA system. Our work shows following contributions: (1) We implement a generic OpenCL based DNN object recognition framework which can executed on general GPUs (AMD, NVIDIA, etc). (2) We implement our framework on embedded system Odroid XU4 [4] by using multiple GPUs and increase 25.8% processing time. (3) We implement our framework on FPGA system and reduce the power consumption by 84.3% compared with TitanXGPU.","PeriodicalId":413646,"journal":{"name":"2018 International Conference on Electronics, Information, and Communication (ICEIC)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Heterogeneous system implementation of deep learning neural network for object detection in OpenCL framework\",\"authors\":\"Shuai Li, Yukui Luo, K. Sun, K. Choi\",\"doi\":\"10.23919/ELINFOCOM.2018.8330645\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"One of the major challenges in these days is \\\"How can we implement up-to-date object detection algorithm in the heterogeneous system?\\\" As in 2012 Visual Object Classes Challenge (VOC)[1] have achieved a very satisfied performance of deep learning neural network (DNN) algorithm, but it depends on CUDA [2] GPU framework and can only be applied on NVIDIA accelerators. We prefer to use a more generic acceleration framework, OpenCL [3] is a golden key to achieve the requirement. Instead of CUDA for NVIDIA GPU only, OpenCL can be applied to the heterogeneous system including CPU, GPU, DSP, FPGA, etc. Heterogeneous systems are more flexible, some of them are designed for portable devices, and some are designed for low power parallel computation. These special devices play a very important role in modern life. In this paper, we present OpenCL based heterogeneous system implementation and apply DNN framework in two typical heterogeneous systems: portable system and FPGA system. Our work shows following contributions: (1) We implement a generic OpenCL based DNN object recognition framework which can executed on general GPUs (AMD, NVIDIA, etc). (2) We implement our framework on embedded system Odroid XU4 [4] by using multiple GPUs and increase 25.8% processing time. (3) We implement our framework on FPGA system and reduce the power consumption by 84.3% compared with TitanXGPU.\",\"PeriodicalId\":413646,\"journal\":{\"name\":\"2018 International Conference on Electronics, Information, and Communication (ICEIC)\",\"volume\":\"10 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 International Conference on Electronics, Information, and Communication (ICEIC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.23919/ELINFOCOM.2018.8330645\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 International Conference on Electronics, Information, and Communication (ICEIC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/ELINFOCOM.2018.8330645","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

摘要

目前的主要挑战之一是“我们如何在异构系统中实现最新的目标检测算法?”如2012年Visual Object Classes Challenge (VOC)[1]已经实现了非常令人满意的深度学习神经网络(DNN)算法性能,但它依赖于CUDA [2] GPU框架,只能应用在NVIDIA加速器上。我们更倾向于使用更通用的加速框架,OpenCL[3]是实现这一需求的金钥匙。OpenCL可以应用于包括CPU、GPU、DSP、FPGA等在内的异构系统,而不是仅针对NVIDIA GPU的CUDA。异构系统更加灵活,有些是为便携式设备设计的,有些是为低功耗并行计算设计的。这些特殊的设备在现代生活中起着非常重要的作用。本文提出了基于OpenCL的异构系统实现,并将深度神经网络框架应用于两种典型的异构系统:便携式系统和FPGA系统。我们的工作显示了以下贡献:(1)我们实现了一个通用的基于OpenCL的DNN对象识别框架,该框架可以在通用gpu (AMD, NVIDIA等)上执行。(2)我们在嵌入式系统Odroid XU4[4]上使用多个gpu实现了我们的框架,处理时间提高了25.8%。(3)我们在FPGA系统上实现了该框架,与TitanXGPU相比,功耗降低了84.3%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Heterogeneous system implementation of deep learning neural network for object detection in OpenCL framework
One of the major challenges in these days is "How can we implement up-to-date object detection algorithm in the heterogeneous system?" As in 2012 Visual Object Classes Challenge (VOC)[1] have achieved a very satisfied performance of deep learning neural network (DNN) algorithm, but it depends on CUDA [2] GPU framework and can only be applied on NVIDIA accelerators. We prefer to use a more generic acceleration framework, OpenCL [3] is a golden key to achieve the requirement. Instead of CUDA for NVIDIA GPU only, OpenCL can be applied to the heterogeneous system including CPU, GPU, DSP, FPGA, etc. Heterogeneous systems are more flexible, some of them are designed for portable devices, and some are designed for low power parallel computation. These special devices play a very important role in modern life. In this paper, we present OpenCL based heterogeneous system implementation and apply DNN framework in two typical heterogeneous systems: portable system and FPGA system. Our work shows following contributions: (1) We implement a generic OpenCL based DNN object recognition framework which can executed on general GPUs (AMD, NVIDIA, etc). (2) We implement our framework on embedded system Odroid XU4 [4] by using multiple GPUs and increase 25.8% processing time. (3) We implement our framework on FPGA system and reduce the power consumption by 84.3% compared with TitanXGPU.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信