2014 IEEE 12th Symposium on Embedded Systems for Real-time Multimedia (ESTIMedia)最新文献_第2页

Real-time integrated face detection and recognition on embedded GPGPUs 基于嵌入式gpgpu的实时集成人脸检测与识别

2014 IEEE 12th Symposium on Embedded Systems for Real-time Multimedia (ESTIMedia) Pub Date : 2014-11-24 DOI: 10.1109/ESTIMedia.2014.6962350

Saehanseul Yi, Illo Yoon, Chanyoung Oh, Youngmin Yi

引用次数: 22

Software platform for hybrid resource management of a many-core accelerator for multimedia applications 多媒体应用多核加速器混合资源管理软件平台

2014 IEEE 12th Symposium on Embedded Systems for Real-time Multimedia (ESTIMedia) Pub Date : 2014-11-24 DOI: 10.1109/ESTIMedia.2014.6962341

Sungchan Kim, Chanhee Lee, Taeyoung Kim, S. Ha

引用次数: 3

Hardware-in-the-loop simulation of Android GPGPU applications Android GPGPU应用的硬件在环仿真

2014 IEEE 12th Symposium on Embedded Systems for Real-time Multimedia (ESTIMedia) Pub Date : 2014-11-24 DOI: 10.1109/ESTIMedia.2014.6962351

Youngsub Ko, Saehanseul Yi, Youngmin Yi, Myungsun Kim, S. Ha

{"title":"Hardware-in-the-loop simulation of Android GPGPU applications","authors":"Youngsub Ko, Saehanseul Yi, Youngmin Yi, Myungsun Kim, S. Ha","doi":"10.1109/ESTIMedia.2014.6962351","DOIUrl":"https://doi.org/10.1109/ESTIMedia.2014.6962351","url":null,"abstract":"Emerging mobile devices are likely to adopt CPU-GPU heterogeneous architecture where an embedded GPU executes offloaded computations from the CPU as well as rendering tasks. For design space exploration of such a CPU-GPU heterogeneous architecture at the early design stage or for monitoring the dynamic system behavior of a system, it is very desirable to run the same application software on a full system simulation platform without modification. Since simulations will be performed repetitively, compromise should be made between simulation speed and timing accuracy. Since all known GPU simulators are very slow, in this paper, we propose a hardware-in-the-loop (HIL) simulation framework that integrates the CPU simulator with an existent GPU hardware. A novel interfacing mechanism between the CPU simulator and the GPU hardware is devised to guarantee functional correctness. The proposed technique maintains the timing accuracy of computation workload as much as possible with unavoidable penalty on the timing accuracy of CPU-GPU communication overhead. The proposed simulation framework is implemented with a gem5 full-system simulator and various kinds of GPGPU hardware. For a real-life scenario, we ported the Android platform to the proponativesed simulation framework and ran a face detection application that calls a native function via JNI. The native function can be written in CUDA or OpenCL if it will be offloaded to the GPU, or in Pthreads if it will be run on the CPU. Preliminary experiments show some use cases of the proposed simulation framework for design space exploration and dynamic behavior monitoring.","PeriodicalId":265392,"journal":{"name":"2014 IEEE 12th Symposium on Embedded Systems for Real-time Multimedia (ESTIMedia)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124885854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Deterministic memory sharing in Kahn process networks: Ultrasound imaging as a case study 卡恩过程网络中的确定性记忆共享:超声成像为例研究

2014 IEEE 12th Symposium on Embedded Systems for Real-time Multimedia (ESTIMedia) Pub Date : 2014-11-24 DOI: 10.1109/ESTIMedia.2014.6962348

Andreas Tretter, Harshavardhan Pandit, Pratyush Kumar, L. Thiele

引用次数: 3

Analyzing preemptive fixed priority scheduling of data flow graphs 数据流图的抢占式固定优先级调度分析

2014 IEEE 12th Symposium on Embedded Systems for Real-time Multimedia (ESTIMedia) Pub Date : 2014-11-24 DOI: 10.1109/ESTIMedia.2014.6962345

Alok Lele, Orlando Moreira, J. Bastos, Ricardo Almeida, P. Pedreiras, K. V. Berkel

{"title":"Analyzing preemptive fixed priority scheduling of data flow graphs","authors":"Alok Lele, Orlando Moreira, J. Bastos, Ricardo Almeida, P. Pedreiras, K. V. Berkel","doi":"10.1109/ESTIMedia.2014.6962345","DOIUrl":"https://doi.org/10.1109/ESTIMedia.2014.6962345","url":null,"abstract":"Data flow graphs can conveniently model embedded streaming applications (ESAs) that are typically implemented as networks of concurrent tasks having an iterative pipelined execution, where the activation of each task may be conditioned by intra- and inter-iteration data dependencies. We propose a novel analysis approach for preemptive Fixed Priority Scheduling (FPS) of multiple ESAs assuming a fixed mapping of tasks onto the processors of the underlying Heterogeneous Multi-Processor System-on-Chip (HMPSoC). The tasks of an ESA are event activated, have varying execution times, and participate in cyclic dependency chains such that they may not have an activation pattern that can be depicted using traditional periodic / sporadic event models. Instead we propose to characterize the data flow graphs of ESAs to upper bound the load they impose on a processor and use it to compute the worst-case response time of an actor executing on that processor at a lower priority. We show that ours is a generic approach for analyzing FPS of data flow graphs. We also propose a refinement of our technique for graphs with a dominant periodic source. We demonstrate our improvement over the state-of-the-art FPS analysis for data flow in our experiments.","PeriodicalId":265392,"journal":{"name":"2014 IEEE 12th Symposium on Embedded Systems for Real-time Multimedia (ESTIMedia)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129646956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Optimized memory access support for data layout conversion on heterogeneous multi-core systems 优化内存访问支持异构多核系统上的数据布局转换

2014 IEEE 12th Symposium on Embedded Systems for Real-time Multimedia (ESTIMedia) Pub Date : 2014-11-24 DOI: 10.1109/ESTIMedia.2014.6962353

C.C.-H. Hsu, Cheng-Yen Lin, Shin-Kai Chen, Chih-Wei Liu, Jenq-Kuen Lee

{"title":"Optimized memory access support for data layout conversion on heterogeneous multi-core systems","authors":"C.C.-H. Hsu, Cheng-Yen Lin, Shin-Kai Chen, Chih-Wei Liu, Jenq-Kuen Lee","doi":"10.1109/ESTIMedia.2014.6962353","DOIUrl":"https://doi.org/10.1109/ESTIMedia.2014.6962353","url":null,"abstract":"Heterogeneous multi-core systems that contain multiple CPUs and GPUs are gaining momentum, as they are providing different computation power to meet the performance demand of modern applications. On such systems, developers try to fully utilize the computation power both for CPU and GPU by using the emerging programming models such as CUDA and OpenCL. To achieve the maximal performance, developers must carefully offload the appropriate workload to the compute devices according to the characteristics of target architecture. Under such scenario, seamlessly data motion between different processors become crucial. Additionally, re-organizing the data layout to fit the target architectures, such as array-of-structure (AOS) for CPU, structure-of-array (SOA) for GPU, and coordinate (COO) format to ELLPACK (ELL) for sparse computation, address such concern. In this paper, we propose a hardware memory manager, which efficiently optimizes the conversion of data layouts for heterogeneous multi-core systems on-the-fly. We address coalescing and sparse format conversion issue in our design. A novel ping-pong transpose architecture is devised to reorganize non-coalescing access pattern, and a histogram unit and sparse address generator are presented to process sparse storage format transformation. Our design reduces the overhead of data transfer and layout transformation among CPU and GPU. In our experiment, our design achieves 68.5 to 2.19 times speed up comparing to software-based library depending on data size.","PeriodicalId":265392,"journal":{"name":"2014 IEEE 12th Symposium on Embedded Systems for Real-time Multimedia (ESTIMedia)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125696414","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7