{"title":"Heterogeneous system implementation of deep learning neural network for object detection in OpenCL framework","authors":"Shuai Li, Yukui Luo, K. Sun, K. Choi","doi":"10.23919/ELINFOCOM.2018.8330645","DOIUrl":null,"url":null,"abstract":"One of the major challenges in these days is \"How can we implement up-to-date object detection algorithm in the heterogeneous system?\" As in 2012 Visual Object Classes Challenge (VOC)[1] have achieved a very satisfied performance of deep learning neural network (DNN) algorithm, but it depends on CUDA [2] GPU framework and can only be applied on NVIDIA accelerators. We prefer to use a more generic acceleration framework, OpenCL [3] is a golden key to achieve the requirement. Instead of CUDA for NVIDIA GPU only, OpenCL can be applied to the heterogeneous system including CPU, GPU, DSP, FPGA, etc. Heterogeneous systems are more flexible, some of them are designed for portable devices, and some are designed for low power parallel computation. These special devices play a very important role in modern life. In this paper, we present OpenCL based heterogeneous system implementation and apply DNN framework in two typical heterogeneous systems: portable system and FPGA system. Our work shows following contributions: (1) We implement a generic OpenCL based DNN object recognition framework which can executed on general GPUs (AMD, NVIDIA, etc). (2) We implement our framework on embedded system Odroid XU4 [4] by using multiple GPUs and increase 25.8% processing time. (3) We implement our framework on FPGA system and reduce the power consumption by 84.3% compared with TitanXGPU.","PeriodicalId":413646,"journal":{"name":"2018 International Conference on Electronics, Information, and Communication (ICEIC)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 International Conference on Electronics, Information, and Communication (ICEIC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/ELINFOCOM.2018.8330645","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Abstract
One of the major challenges in these days is "How can we implement up-to-date object detection algorithm in the heterogeneous system?" As in 2012 Visual Object Classes Challenge (VOC)[1] have achieved a very satisfied performance of deep learning neural network (DNN) algorithm, but it depends on CUDA [2] GPU framework and can only be applied on NVIDIA accelerators. We prefer to use a more generic acceleration framework, OpenCL [3] is a golden key to achieve the requirement. Instead of CUDA for NVIDIA GPU only, OpenCL can be applied to the heterogeneous system including CPU, GPU, DSP, FPGA, etc. Heterogeneous systems are more flexible, some of them are designed for portable devices, and some are designed for low power parallel computation. These special devices play a very important role in modern life. In this paper, we present OpenCL based heterogeneous system implementation and apply DNN framework in two typical heterogeneous systems: portable system and FPGA system. Our work shows following contributions: (1) We implement a generic OpenCL based DNN object recognition framework which can executed on general GPUs (AMD, NVIDIA, etc). (2) We implement our framework on embedded system Odroid XU4 [4] by using multiple GPUs and increase 25.8% processing time. (3) We implement our framework on FPGA system and reduce the power consumption by 84.3% compared with TitanXGPU.