Hardware-in-the-loop simulation of Android GPGPU applications

2014 IEEE 12th Symposium on Embedded Systems for Real-time Multimedia (ESTIMedia) Pub Date : 2014-11-24 DOI:10.1109/ESTIMedia.2014.6962351

Youngsub Ko, Saehanseul Yi, Youngmin Yi, Myungsun Kim, S. Ha

{"title":"Hardware-in-the-loop simulation of Android GPGPU applications","authors":"Youngsub Ko, Saehanseul Yi, Youngmin Yi, Myungsun Kim, S. Ha","doi":"10.1109/ESTIMedia.2014.6962351","DOIUrl":null,"url":null,"abstract":"Emerging mobile devices are likely to adopt CPU-GPU heterogeneous architecture where an embedded GPU executes offloaded computations from the CPU as well as rendering tasks. For design space exploration of such a CPU-GPU heterogeneous architecture at the early design stage or for monitoring the dynamic system behavior of a system, it is very desirable to run the same application software on a full system simulation platform without modification. Since simulations will be performed repetitively, compromise should be made between simulation speed and timing accuracy. Since all known GPU simulators are very slow, in this paper, we propose a hardware-in-the-loop (HIL) simulation framework that integrates the CPU simulator with an existent GPU hardware. A novel interfacing mechanism between the CPU simulator and the GPU hardware is devised to guarantee functional correctness. The proposed technique maintains the timing accuracy of computation workload as much as possible with unavoidable penalty on the timing accuracy of CPU-GPU communication overhead. The proposed simulation framework is implemented with a gem5 full-system simulator and various kinds of GPGPU hardware. For a real-life scenario, we ported the Android platform to the proponativesed simulation framework and ran a face detection application that calls a native function via JNI. The native function can be written in CUDA or OpenCL if it will be offloaded to the GPU, or in Pthreads if it will be run on the CPU. Preliminary experiments show some use cases of the proposed simulation framework for design space exploration and dynamic behavior monitoring.","PeriodicalId":265392,"journal":{"name":"2014 IEEE 12th Symposium on Embedded Systems for Real-time Multimedia (ESTIMedia)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE 12th Symposium on Embedded Systems for Real-time Multimedia (ESTIMedia)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ESTIMedia.2014.6962351","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

Emerging mobile devices are likely to adopt CPU-GPU heterogeneous architecture where an embedded GPU executes offloaded computations from the CPU as well as rendering tasks. For design space exploration of such a CPU-GPU heterogeneous architecture at the early design stage or for monitoring the dynamic system behavior of a system, it is very desirable to run the same application software on a full system simulation platform without modification. Since simulations will be performed repetitively, compromise should be made between simulation speed and timing accuracy. Since all known GPU simulators are very slow, in this paper, we propose a hardware-in-the-loop (HIL) simulation framework that integrates the CPU simulator with an existent GPU hardware. A novel interfacing mechanism between the CPU simulator and the GPU hardware is devised to guarantee functional correctness. The proposed technique maintains the timing accuracy of computation workload as much as possible with unavoidable penalty on the timing accuracy of CPU-GPU communication overhead. The proposed simulation framework is implemented with a gem5 full-system simulator and various kinds of GPGPU hardware. For a real-life scenario, we ported the Android platform to the proponativesed simulation framework and ran a face detection application that calls a native function via JNI. The native function can be written in CUDA or OpenCL if it will be offloaded to the GPU, or in Pthreads if it will be run on the CPU. Preliminary experiments show some use cases of the proposed simulation framework for design space exploration and dynamic behavior monitoring.

查看原文本刊更多论文

Android GPGPU应用的硬件在环仿真

新兴的移动设备很可能采用CPU-GPU异构架构，其中嵌入式GPU执行CPU的卸载计算以及渲染任务。为了在设计初期探索这种CPU-GPU异构架构的设计空间，或者为了监控系统的动态系统行为，非常希望在一个完整的系统仿真平台上运行相同的应用软件，而不做任何修改。由于模拟将被重复执行，因此应该在模拟速度和计时精度之间做出妥协。由于所有已知的GPU模拟器都非常慢，在本文中，我们提出了一个硬件在环(HIL)仿真框架，该框架将CPU模拟器与现有的GPU硬件集成在一起。在CPU模拟器和GPU硬件之间设计了一种新的接口机制，以保证功能的正确性。该技术尽可能地保持计算工作负载的定时精度，但不可避免地会对CPU-GPU通信开销的定时精度造成损失。采用gem5全系统模拟器和多种GPGPU硬件实现了所提出的仿真框架。对于现实场景，我们将Android平台移植到支持的模拟框架中，并运行一个通过JNI调用本地函数的人脸检测应用程序。本机函数可以在CUDA或OpenCL中编写，如果它将被卸载到GPU上，或者在Pthreads中编写，如果它将在CPU上运行。初步实验显示了所提出的仿真框架在设计空间探索和动态行为监测方面的一些用例。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2014 IEEE 12th Symposium on Embedded Systems for Real-time Multimedia (ESTIMedia)

自引率

0.00%

发文量