2014 IEEE 12th Symposium on Embedded Systems for Real-time Multimedia (ESTIMedia)最新文献

筛选
英文 中文
Real-time integrated face detection and recognition on embedded GPGPUs 基于嵌入式gpgpu的实时集成人脸检测与识别
2014 IEEE 12th Symposium on Embedded Systems for Real-time Multimedia (ESTIMedia) Pub Date : 2014-11-24 DOI: 10.1109/ESTIMedia.2014.6962350
Saehanseul Yi, Illo Yoon, Chanyoung Oh, Youngmin Yi
{"title":"Real-time integrated face detection and recognition on embedded GPGPUs","authors":"Saehanseul Yi, Illo Yoon, Chanyoung Oh, Youngmin Yi","doi":"10.1109/ESTIMedia.2014.6962350","DOIUrl":"https://doi.org/10.1109/ESTIMedia.2014.6962350","url":null,"abstract":"Both face detection and face recognition have started to be used widely these days in various applications such as biometric, surveillance, security, advertisement, entertainment, and so on. The ever increasing input image size in face detection and the large input DB in face recognition keep requiring more computational power to achieve real-time processing. Recently, embedded GPUs have started to support OpenCL and many applications can be accelerated successfully as the server GPUs have. In this paper, we propose several optimization techniques for the Local Binary Pattern (LBP) based integrated face detection and recognition algorithms, and successfully accelerated them achieving 22 fps using OpenCL on ARM Mali GPU, and 38 fps using CUDA on Tegra K1 GPU for HD inputs. This corresponds to 2.9 times and 3.7 times speedups respectively. To the best of our knowledge, it is the first paper that presents the acceleration of the face detection on embedded GPGPUs, and also that presents the performance of Tegra K1 GPU.","PeriodicalId":265392,"journal":{"name":"2014 IEEE 12th Symposium on Embedded Systems for Real-time Multimedia (ESTIMedia)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127630349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Software platform for hybrid resource management of a many-core accelerator for multimedia applications 多媒体应用多核加速器混合资源管理软件平台
2014 IEEE 12th Symposium on Embedded Systems for Real-time Multimedia (ESTIMedia) Pub Date : 2014-11-24 DOI: 10.1109/ESTIMedia.2014.6962341
Sungchan Kim, Chanhee Lee, Taeyoung Kim, S. Ha
{"title":"Software platform for hybrid resource management of a many-core accelerator for multimedia applications","authors":"Sungchan Kim, Chanhee Lee, Taeyoung Kim, S. Ha","doi":"10.1109/ESTIMedia.2014.6962341","DOIUrl":"https://doi.org/10.1109/ESTIMedia.2014.6962341","url":null,"abstract":"As incessant demand of higher computing capability makes a many-core accelerator become a major computing resource in a System-on-Chip, a variety of many-core architectures and resource management techniques have been proposed recently. They usually assume a specific hardware architecture and a specific resource management scheme. In this paper, we propose a generic software platform that implements a hybrid resource management technique, targeting for a wide range of many-core architectures. To evaluate the system performance more accurately before SoC fabrication, we run it on a virtual prototyping system. The actual implementation enables us to investigate the overheads involved in the propose software platform. Preliminary experimental results confirm that the proposed software platform adapts to the dynamic workload variation effectively by dynamic mapping of tasks and tolerate unexpected core failures by check-pointing. We address our perspective on future research issues to make the generic software platform a reality.","PeriodicalId":265392,"journal":{"name":"2014 IEEE 12th Symposium on Embedded Systems for Real-time Multimedia (ESTIMedia)","volume":"325 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134070923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Hardware-in-the-loop simulation of Android GPGPU applications Android GPGPU应用的硬件在环仿真
2014 IEEE 12th Symposium on Embedded Systems for Real-time Multimedia (ESTIMedia) Pub Date : 2014-11-24 DOI: 10.1109/ESTIMedia.2014.6962351
Youngsub Ko, Saehanseul Yi, Youngmin Yi, Myungsun Kim, S. Ha
{"title":"Hardware-in-the-loop simulation of Android GPGPU applications","authors":"Youngsub Ko, Saehanseul Yi, Youngmin Yi, Myungsun Kim, S. Ha","doi":"10.1109/ESTIMedia.2014.6962351","DOIUrl":"https://doi.org/10.1109/ESTIMedia.2014.6962351","url":null,"abstract":"Emerging mobile devices are likely to adopt CPU-GPU heterogeneous architecture where an embedded GPU executes offloaded computations from the CPU as well as rendering tasks. For design space exploration of such a CPU-GPU heterogeneous architecture at the early design stage or for monitoring the dynamic system behavior of a system, it is very desirable to run the same application software on a full system simulation platform without modification. Since simulations will be performed repetitively, compromise should be made between simulation speed and timing accuracy. Since all known GPU simulators are very slow, in this paper, we propose a hardware-in-the-loop (HIL) simulation framework that integrates the CPU simulator with an existent GPU hardware. A novel interfacing mechanism between the CPU simulator and the GPU hardware is devised to guarantee functional correctness. The proposed technique maintains the timing accuracy of computation workload as much as possible with unavoidable penalty on the timing accuracy of CPU-GPU communication overhead. The proposed simulation framework is implemented with a gem5 full-system simulator and various kinds of GPGPU hardware. For a real-life scenario, we ported the Android platform to the proponativesed simulation framework and ran a face detection application that calls a native function via JNI. The native function can be written in CUDA or OpenCL if it will be offloaded to the GPU, or in Pthreads if it will be run on the CPU. Preliminary experiments show some use cases of the proposed simulation framework for design space exploration and dynamic behavior monitoring.","PeriodicalId":265392,"journal":{"name":"2014 IEEE 12th Symposium on Embedded Systems for Real-time Multimedia (ESTIMedia)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124885854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Deterministic memory sharing in Kahn process networks: Ultrasound imaging as a case study 卡恩过程网络中的确定性记忆共享:超声成像为例研究
2014 IEEE 12th Symposium on Embedded Systems for Real-time Multimedia (ESTIMedia) Pub Date : 2014-11-24 DOI: 10.1109/ESTIMedia.2014.6962348
Andreas Tretter, Harshavardhan Pandit, Pratyush Kumar, L. Thiele
{"title":"Deterministic memory sharing in Kahn process networks: Ultrasound imaging as a case study","authors":"Andreas Tretter, Harshavardhan Pandit, Pratyush Kumar, L. Thiele","doi":"10.1109/ESTIMedia.2014.6962348","DOIUrl":"https://doi.org/10.1109/ESTIMedia.2014.6962348","url":null,"abstract":"Kahn process networks are a popular programming model for programming multi-core systems. They ensure determinacy of applications by restricting processes to separate memory regions, only allowing communication over FIFO channels. However, many modern multi-core platforms concentrate on shared memory as a means of communication and data exchange. In this work, we present a concept for deterministic memory sharing in Kahn process networks. It allows to take advantage of shared memory data exchange mechanisms on such platforms while still preserving determinacy. We show how any Kahn process network can be transformed to use deterministic memory sharing by giving a set of transformations that can be applied selectively, only looking at one process at a time. We demonstrate how these techniques can be applied to an ultrasound image reconstruction algorithm. For an implementation on a test system, our technique yields significantly better performance combined with a drastically smaller memory footprint.","PeriodicalId":265392,"journal":{"name":"2014 IEEE 12th Symposium on Embedded Systems for Real-time Multimedia (ESTIMedia)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126640257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Analyzing preemptive fixed priority scheduling of data flow graphs 数据流图的抢占式固定优先级调度分析
2014 IEEE 12th Symposium on Embedded Systems for Real-time Multimedia (ESTIMedia) Pub Date : 2014-11-24 DOI: 10.1109/ESTIMedia.2014.6962345
Alok Lele, Orlando Moreira, J. Bastos, Ricardo Almeida, P. Pedreiras, K. V. Berkel
{"title":"Analyzing preemptive fixed priority scheduling of data flow graphs","authors":"Alok Lele, Orlando Moreira, J. Bastos, Ricardo Almeida, P. Pedreiras, K. V. Berkel","doi":"10.1109/ESTIMedia.2014.6962345","DOIUrl":"https://doi.org/10.1109/ESTIMedia.2014.6962345","url":null,"abstract":"Data flow graphs can conveniently model embedded streaming applications (ESAs) that are typically implemented as networks of concurrent tasks having an iterative pipelined execution, where the activation of each task may be conditioned by intra- and inter-iteration data dependencies. We propose a novel analysis approach for preemptive Fixed Priority Scheduling (FPS) of multiple ESAs assuming a fixed mapping of tasks onto the processors of the underlying Heterogeneous Multi-Processor System-on-Chip (HMPSoC). The tasks of an ESA are event activated, have varying execution times, and participate in cyclic dependency chains such that they may not have an activation pattern that can be depicted using traditional periodic / sporadic event models. Instead we propose to characterize the data flow graphs of ESAs to upper bound the load they impose on a processor and use it to compute the worst-case response time of an actor executing on that processor at a lower priority. We show that ours is a generic approach for analyzing FPS of data flow graphs. We also propose a refinement of our technique for graphs with a dominant periodic source. We demonstrate our improvement over the state-of-the-art FPS analysis for data flow in our experiments.","PeriodicalId":265392,"journal":{"name":"2014 IEEE 12th Symposium on Embedded Systems for Real-time Multimedia (ESTIMedia)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129646956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Optimized memory access support for data layout conversion on heterogeneous multi-core systems 优化内存访问支持异构多核系统上的数据布局转换
2014 IEEE 12th Symposium on Embedded Systems for Real-time Multimedia (ESTIMedia) Pub Date : 2014-11-24 DOI: 10.1109/ESTIMedia.2014.6962353
C.C.-H. Hsu, Cheng-Yen Lin, Shin-Kai Chen, Chih-Wei Liu, Jenq-Kuen Lee
{"title":"Optimized memory access support for data layout conversion on heterogeneous multi-core systems","authors":"C.C.-H. Hsu, Cheng-Yen Lin, Shin-Kai Chen, Chih-Wei Liu, Jenq-Kuen Lee","doi":"10.1109/ESTIMedia.2014.6962353","DOIUrl":"https://doi.org/10.1109/ESTIMedia.2014.6962353","url":null,"abstract":"Heterogeneous multi-core systems that contain multiple CPUs and GPUs are gaining momentum, as they are providing different computation power to meet the performance demand of modern applications. On such systems, developers try to fully utilize the computation power both for CPU and GPU by using the emerging programming models such as CUDA and OpenCL. To achieve the maximal performance, developers must carefully offload the appropriate workload to the compute devices according to the characteristics of target architecture. Under such scenario, seamlessly data motion between different processors become crucial. Additionally, re-organizing the data layout to fit the target architectures, such as array-of-structure (AOS) for CPU, structure-of-array (SOA) for GPU, and coordinate (COO) format to ELLPACK (ELL) for sparse computation, address such concern. In this paper, we propose a hardware memory manager, which efficiently optimizes the conversion of data layouts for heterogeneous multi-core systems on-the-fly. We address coalescing and sparse format conversion issue in our design. A novel ping-pong transpose architecture is devised to reorganize non-coalescing access pattern, and a histogram unit and sparse address generator are presented to process sparse storage format transformation. Our design reduces the overhead of data transfer and layout transformation among CPU and GPU. In our experiment, our design achieves 68.5 to 2.19 times speed up comparing to software-based library depending on data size.","PeriodicalId":265392,"journal":{"name":"2014 IEEE 12th Symposium on Embedded Systems for Real-time Multimedia (ESTIMedia)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125696414","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信