2014 IEEE 12th Symposium on Embedded Systems for Real-time Multimedia (ESTIMedia)最新文献

Forget the battery, let's play games! 别管电池了，我们来玩游戏吧!

2014 IEEE 12th Symposium on Embedded Systems for Real-time Multimedia (ESTIMedia) Pub Date : 2014-11-24 DOI: 10.1109/ESTIMedia.2014.6962338

Benedikt Dietrich, S. Chakraborty

引用次数: 14

Approximate computing for efficient information processing 近似计算，有效的信息处理

2014 IEEE 12th Symposium on Embedded Systems for Real-time Multimedia (ESTIMedia) Pub Date : 2014-11-24 DOI: 10.1109/ESTIMedia.2014.6962339

Swagath Venkataramani, S. Chakradhar, K. Roy, A. Raghunathan

引用次数: 2

An embedded co-processor architecture for energy-efficient stream computing 一种用于高效流计算的嵌入式协处理器架构

2014 IEEE 12th Symposium on Embedded Systems for Real-time Multimedia (ESTIMedia) Pub Date : 2014-11-24 DOI: 10.1109/ESTIMedia.2014.6962346

Amrit Panda, K. Chatha

{"title":"An embedded co-processor architecture for energy-efficient stream computing","authors":"Amrit Panda, K. Chatha","doi":"10.1109/ESTIMedia.2014.6962346","DOIUrl":"https://doi.org/10.1109/ESTIMedia.2014.6962346","url":null,"abstract":"Stream processing has emerged as an important model of computation in the context of multimedia and communication sub-systems of embedded System-on-Chip (SoC) architectures. The dataflow nature of streaming applications allows them to be most naturally expressed as a set of kernels iteratively operating on continuous streams of data. The kernels are computationally intensive and exhibit large amounts of data and instruction level parallelism. Streaming applications are mainly characterized by real-time constraints that demand high throughput and data bandwidth with limited global data reuse. Conventional architectures fail to meet these demands due to their poorly matched execution models and the overheads associated with instruction and data movements. We present StreamEngine, an embedded architecture for energy-efficient computation of stream kernels. StreamEngine introduces an instruction locking mechanism that exploits the iterative nature of the kernels and enables fine-grain instruction reuse. We also adopt a Context-aware Dataflow Execution model to exploit instruction-level and data-level parallelism within the stream kernels. Each instruction in StreamEngine is locked to a Reservation Station and maintains a context that is updated upon execution; thus instructions never retire from the RS. The entire kernel is hosted in RS Banks close to functional units for energy-efficient instruction and operand delivery. We evaluate the performance and energy-efficiency of our architecture for stream kernel benchmarks by implementing the architecture with TSMC 45nm process, and comparison with an embedded RISC processor.","PeriodicalId":265392,"journal":{"name":"2014 IEEE 12th Symposium on Embedded Systems for Real-time Multimedia (ESTIMedia)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125308384","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Quality-aware mobile graphics workload characterization for energy-efficient DVFS design 面向节能DVFS设计的具有质量意识的移动图形工作负载表征

2014 IEEE 12th Symposium on Embedded Systems for Real-time Multimedia (ESTIMedia) Pub Date : 2014-11-24 DOI: 10.1109/ESTIMedia.2014.6962347

Jurn-Gyu Park, Chen-Ying Hsieh, N. Dutt, Sung-Soo Lim

{"title":"Quality-aware mobile graphics workload characterization for energy-efficient DVFS design","authors":"Jurn-Gyu Park, Chen-Ying Hsieh, N. Dutt, Sung-Soo Lim","doi":"10.1109/ESTIMedia.2014.6962347","DOIUrl":"https://doi.org/10.1109/ESTIMedia.2014.6962347","url":null,"abstract":"Contemporary mobile platforms use mobile GPUs for graphics-intensive applications, and deploy proprietary Dynamic Voltage Frequency Scaling (DVFS) policies in an attempt to save energy without sacrificing quality. However, there have been no previous systematic studies to correlate the performance, power, and energy efficiency of mobile GPUs based on diverse graphics workloads to enable more efficient mobile platform DVFS policies for energy savings. For the first time we present a study of mobile GPU graphics workload characterization for DVFS design considering user experience and energy efficiency on a real smart-phone. We develop micro-benchmarks that stress specific stages of the graphics pipeline separately, and study the relationship between varying graphics workloads and resulting energy and performance of different mobile graphics pipeline stages. We use these results to outline opportunities for more efficient, integrated DVFS policies across the mobile GPU, memory and CPU hardware components for saving energy without sacrificing user experience. Our experimental results on the Nexus 4 smartphone show that it is important to characterize GPU hardware and graphics workloads accurately in order to achieve increased energy efficiency without degradation in graphics performance for better user experience. We believe that our observations and results will enable more energy-efficient DVFS algorithms for mobile graphics rendering in the face of rapidly changing mobile GPU architectures.","PeriodicalId":265392,"journal":{"name":"2014 IEEE 12th Symposium on Embedded Systems for Real-time Multimedia (ESTIMedia)","volume":"108 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133571003","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 23

System-level power & energy estimation methodology and optimization techniques for CPU-GPU based mobile platforms 基于CPU-GPU的移动平台的系统级功率和能量估计方法和优化技术

2014 IEEE 12th Symposium on Embedded Systems for Real-time Multimedia (ESTIMedia) Pub Date : 2014-11-24 DOI: 10.1109/ESTIMedia.2014.6962352

S. Rethinagiri, Oscar Palomar, J. Moreno, Gulay Yalcin, O. Unsal, A. Cristal

{"title":"System-level power & energy estimation methodology and optimization techniques for CPU-GPU based mobile platforms","authors":"S. Rethinagiri, Oscar Palomar, J. Moreno, Gulay Yalcin, O. Unsal, A. Cristal","doi":"10.1109/ESTIMedia.2014.6962352","DOIUrl":"https://doi.org/10.1109/ESTIMedia.2014.6962352","url":null,"abstract":"Due to the growing computational requirements of mobile applications, using a heterogeneous Multiprocessor System-on-Chip becomes an incontrovertible solution to meet the service requirements. Today, Electronic System-Level design is considered as a vital premise to explore design trade-offs for such devices in the early stage of the design flow. This paper proposes a novel system-level power/energy estimation methodology and optimization techniques for heterogeneous CPU-GPU based platforms. There are two parts involved in this methodology. First, we developed the power models by using functional parameters to set up generic power models for different parts of the platform. Second, we designed a simulation based system-level prototype using SystemC (JIT) and Cycle-Accurate simulators to accurately evaluate the activities used in the related power models. The combination of the two parts leads to a novel power estimation methodology at system-level, which gives a good trade-off between accuracy and speed. Moreover, leveraging our methodology, we introduce novel power optimization techniques such as inter-task DVFS and workload balancing at the system-level for CPU-GPU platforms. The efficiency of our proposed methodology and optimization techniques are validated through a CARMA kit, which consists of an ARM quad-core processor and a NVIDIA GPU processor (96 cores). Estimated power and energy values are compared to real board measurements. Our obtained power/energy estimation results provide less than 2.5% of error for single core processor, 4% for dual-core processor, 4% for quad-core, 4% for GPU and 6% multi-processor based systems. By using the proposed optimization techniques, we achieved significant power and energy savings of up to 45% and 70% respectively for various industrial benchmarks.","PeriodicalId":265392,"journal":{"name":"2014 IEEE 12th Symposium on Embedded Systems for Real-time Multimedia (ESTIMedia)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132821328","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Optimal GEDF-based schedulers that allow intra-task parallelism on heterogeneous multiprocessors 基于gedf的最优调度器，允许在异构多处理器上实现任务内并行

2014 IEEE 12th Symposium on Embedded Systems for Real-time Multimedia (ESTIMedia) Pub Date : 2014-11-24 DOI: 10.1109/ESTIMedia.2014.6962343

Kecheng Yang, James H. Anderson

引用次数: 24

A Bayesian network approach for compiler auto-tuning for embedded processors 嵌入式处理器编译器自动调优的贝叶斯网络方法

2014 IEEE 12th Symposium on Embedded Systems for Real-time Multimedia (ESTIMedia) Pub Date : 2014-11-24 DOI: 10.1109/ESTIMedia.2014.6962349

Amir H. Ashouri, Giovanni Mariani, G. Palermo, C. Silvano

{"title":"A Bayesian network approach for compiler auto-tuning for embedded processors","authors":"Amir H. Ashouri, Giovanni Mariani, G. Palermo, C. Silvano","doi":"10.1109/ESTIMedia.2014.6962349","DOIUrl":"https://doi.org/10.1109/ESTIMedia.2014.6962349","url":null,"abstract":"The complexity and diversity of today's architectures require an additional effort from the programmers in porting and tuning the application code across different platforms. The problem is even more complex when considering that also the compiler requires some tuning, since standard optimization options have been customized for specific architectures or designed for the average case. This paper proposes a machine-learning approach for reducing the cost of the compiler auto-tuning phase and to speedup the application performance in embedded architectures. The proposed framework is based on an application characterization done dynamically with microarchitecture independent features and based on the usage of Bayesian Networks. The main characteristic of the Bayesian Network approach consists of not describing the solution as a strict set of compiler transformations to be applied, but as a complex probability distribution function to be sampled. Experimental results, carried out on an ARM platform and GCC transformation space, proved the effectiveness of the proposed methodology for the selected benchmarks. The selected set of solutions (less than 10% of the search space) demonstrated to be very close to the optimal sequence of transformations, showing also an applications performance speedup up to 2.8 (1.5 on average) with respect to -O2 and -O3 for the cBench suite. Additionally, the proposed method demonstrated a 3× speedup in terms of search time with respect to an iterative compilation approach, given the same quality of the solutions1.","PeriodicalId":265392,"journal":{"name":"2014 IEEE 12th Symposium on Embedded Systems for Real-time Multimedia (ESTIMedia)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124725454","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 40

Buffer allocation for real-time streaming on a multi-processor without back-pressure 在无背压的多处理器上为实时流分配缓冲区

2014 IEEE 12th Symposium on Embedded Systems for Real-time Multimedia (ESTIMedia) Pub Date : 2014-11-24 DOI: 10.1109/ESTIMedia.2014.6962342

Hrishikesh Salunkhe, Orlando Moreira, K. V. Berkel

{"title":"Buffer allocation for real-time streaming on a multi-processor without back-pressure","authors":"Hrishikesh Salunkhe, Orlando Moreira, K. V. Berkel","doi":"10.1109/ESTIMedia.2014.6962342","DOIUrl":"https://doi.org/10.1109/ESTIMedia.2014.6962342","url":null,"abstract":"The goal of buffer allocation for real-time streaming applications, modeled as dataflow graphs, is to minimize total memory consumption while reserving sufficient space for each production without overwriting any live tokens and guaranteeing the satisfaction of real-time constraints. We present a buffer allocation solution for dataflow graphs scheduled on a system without back-pressure. Our contributions are 1) We extend the available dataflow techniques by applying best-case analysis. 2) We introduce dominator based relative life-time analysis. For our benchmark set, it exhibits up to 12% savings on memory consumption compared to traditional absolute life-time analysis. 3) We investigate the effect of variation in execution times on the buffer sizes for systems without back-pressure. It turns out that reducing the variation in execution times reduces the buffer sizes. 4) We compare the buffer allocation techniques for systems with and without backpressure. For our benchmark set, we show that the system with back-pressure reduces the total memory consumption by as much as 28 % compared to the system without back-pressure. Our benchmark set includes wireless communications and multimedia applications.","PeriodicalId":265392,"journal":{"name":"2014 IEEE 12th Symposium on Embedded Systems for Real-time Multimedia (ESTIMedia)","volume":"18 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120924756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

Parallelization and performance prediction for HEVC UHD real-time software decoding HEVC超高清实时软件解码的并行化与性能预测

2014 IEEE 12th Symposium on Embedded Systems for Real-time Multimedia (ESTIMedia) Pub Date : 2014-11-24 DOI: 10.1109/ESTIMedia.2014.6962344

Junsoo Jeong, Junchul Choi, S. Ha

引用次数: 1

Mapping programs for execution on pipelined MPSoCs 在流水线mpsoc上执行的映射程序

2014 IEEE 12th Symposium on Embedded Systems for Real-time Multimedia (ESTIMedia) Pub Date : 2014-11-24 DOI: 10.1109/ESTIMedia.2014.6962340

S. Parameswaran

引用次数: 0