{"title":"Performance Analysis of Parallel Execution of H.264 Encoder on the Cell Processor","authors":"Jonghan Park, S. Ha","doi":"10.1109/ESTMED.2007.4375797","DOIUrl":"https://doi.org/10.1109/ESTMED.2007.4375797","url":null,"abstract":"Performance improvement by parallel execution depends on two factors: the potential parallelism of the application itself, and the optimal mapping of the application to the target architecture, which is usually very target specific. As a case study, we analyze the expected performance of parallel execution of an H.264 encoding algorithm, known as X264, on the cell processor. Considering the communication architecture of the Cell processor, we parallelize the algorithm at the macro-block level. From the performance analysis, we discover the overhead factors of parallel execution and estimate the expected performance. Comparison with simulation results proves the accuracy and the usefulness of the proposed analysis method.","PeriodicalId":428196,"journal":{"name":"2007 IEEE/ACM/IFIP Workshop on Embedded Systems for Real-Time Multimedia","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132122801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
T. Henriksson, P. V. D. Wolf, A. Jantsch, A. Bruce
{"title":"Network Calculus Applied to Verification of Memory Access Performance in SoCs","authors":"T. Henriksson, P. V. D. Wolf, A. Jantsch, A. Bruce","doi":"10.1109/ESTMED.2007.4375796","DOIUrl":"https://doi.org/10.1109/ESTMED.2007.4375796","url":null,"abstract":"SoCs for multimedia applications typically use only one port to off-chip DRAM for cost reasons. The sharing of interconnect and the off-chip DRAM port by several IP blocks makes the performance of a SoC under design hard to predict. Network calculus defines the concept of flow and has been successfully used to analyse the performance of communication networks. We propose to apply network calculus to the verification of memory access latencies. Two novel network elements, packet stretcher and packet compressor, are used to model the SoC interconnect and DRAM controller. We further extend the flow concept with a degree and make use of the peak characteristics of a flow to tighten the bounds in the analysis. We present a video playback case study and show that the proposed application of network calculus allows us to statically verify that all requirements on memory access latency are fulfilled.","PeriodicalId":428196,"journal":{"name":"2007 IEEE/ACM/IFIP Workshop on Embedded Systems for Real-Time Multimedia","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116311947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MP-Queue: an Efficient Communication Library for Embedded Streaming Multimedia Platforms","authors":"A. D. Torre, M. Ruggiero, L. Benini","doi":"10.1109/ESTMED.2007.4375813","DOIUrl":"https://doi.org/10.1109/ESTMED.2007.4375813","url":null,"abstract":"In this paper we present MP-queue, a flexible and efficient queue-based communication library for MPSoCs. Our library is suitable for a wide range of hardware platforms and its configuration space is explored across a wide number of dimensions. We introduce an upper-bound evaluation metric to compare the efficiency of the library against an ideal point-to-point data transfer. We can thus quantitatively assess the overhead introduced by the synchronization protocol and by shared bus contention. We discuss source-level optimizations introduced in the library that enable aggressive compiler optimizations, without compromising code portability. A significant speedup is achieved w.r.t a non-optimized library (15% for small-size messages), while communication efficiency rises up to 90% for large messages.","PeriodicalId":428196,"journal":{"name":"2007 IEEE/ACM/IFIP Workshop on Embedded Systems for Real-Time Multimedia","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129818403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mohammed G. Khatib, B. V. D. Zwaag, P. Hartel, G. Smit
{"title":"Interposing Flash between Disk and DRAM to Save Energy for Streaming Workloads","authors":"Mohammed G. Khatib, B. V. D. Zwaag, P. Hartel, G. Smit","doi":"10.1109/ESTMED.2007.4375793","DOIUrl":"https://doi.org/10.1109/ESTMED.2007.4375793","url":null,"abstract":"In computer systems, the storage hierarchy, composed of a disk drive and a DRAM, is responsible for a large portion of the total energy consumed. This work studies the energy merit of interposing flash memory as a streaming buffer between the disk drive and the DRAM. Doing so, we extend the spin-off period of the disk drive and cut down on the DRAM capacity at the cost of (extra) flash. We study two different streaming applications: mobile multimedia players and media servers. Our simulated results show that for light workloads, a system with a flash as a buffer between the disk and the DRAM consumes up to 40% less energy than the same system without a flash buffer. For heavy workloads savings of at least 30% are possible. We also address the wear- out of flash and present a simple solution to extend its lifetime.","PeriodicalId":428196,"journal":{"name":"2007 IEEE/ACM/IFIP Workshop on Embedded Systems for Real-Time Multimedia","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130338726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Impact of Task Migration on Streaming Multimedia for Embedded Multiprocessors: A Quantitative Evaluation","authors":"M. Pittau, A. Alimonda, S. Carta, A. Acquaviva","doi":"10.1109/ESTMED.2007.4375803","DOIUrl":"https://doi.org/10.1109/ESTMED.2007.4375803","url":null,"abstract":"Dynamic task mapping solutions based on task migration has been recently explored to perform run-time reallocation of task to maximize performance and optimize energy consumption in MPSoCs. Even if task migration can provide high flexibility, its overhead must be carefully evaluated when applied to soft real-time applications. In fact, these applications impose deadlines that may be missed during the migration process. In this paper we first present a middleware infrastructure supporting dynamic task allocation for NUMA architectures. Then we perform an extensive characterization of its impact on multimedia soft realtime applications using a software FM Radio benchmark.","PeriodicalId":428196,"journal":{"name":"2007 IEEE/ACM/IFIP Workshop on Embedded Systems for Real-Time Multimedia","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134521698","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Quick Safari Through the MPSoC Run-Time Management Jungle","authors":"V. Nollet, D. Verkest, H. Corporaal","doi":"10.1109/ESTMED.2007.4375800","DOIUrl":"https://doi.org/10.1109/ESTMED.2007.4375800","url":null,"abstract":"The multiprocessor SoC (MPSoC) revolution is fueled by the need to execute multiple advanced multimedia applications on a single embedded computing platform. At design-time, the applications that will run in parallel and their respective user requirements are unknown. Hence, a run-time manager is needed to match all application needs with the available platform resources and services. Creating such run-time manager requires two decisions. First, one needs to decide what functionality to implement. Second, one has to decide how to implement this functionality in order to meet boundary conditions like e.g. real-time performance. This paper is the first to detail a generic view on MPSoC runtime management functionality and its design space tradeoffs. We substantiate the run-time components and the implementation trade-offs with state-of-the-art solutions and a brief overview of some industrial and academic multiprocessor run-time management examples. We show a clear trend towards more hardware acceleration, a limited distribution of management functionality over the platform and increasing support for adaptive multimedia applications.","PeriodicalId":428196,"journal":{"name":"2007 IEEE/ACM/IFIP Workshop on Embedded Systems for Real-Time Multimedia","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133225265","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Code Placement for Reducing the Energy Consumption of Embedded Processors with Scratchpad and Cache Memories","authors":"Yuriko Ishitobi, T. Ishihara, H. Yasuura","doi":"10.1109/ESTMED.2007.4375794","DOIUrl":"https://doi.org/10.1109/ESTMED.2007.4375794","url":null,"abstract":"This paper proposes a code placement algorithm for reducing the total energy consumption of embedded processor systems including a CPU core, on-chip and off-chip memories. Our approach exploits a noncacheable memory region for an effective use of a cache memory and as a result, reduces the number of off-chip accesses. Our algorithm simultaneously finds code layouts for a cacheable region, a scratchpad region, and the other non-cacheable region of the address space so as to minimize the total energy consumption of the processor system. Experiments using a commercial embedded processor and an off-chip SDRAM demonstrate that our algorithm reduces the energy consumption of the processor system by 23% without any performance loss compared to the best result achieved by the conventional approach.","PeriodicalId":428196,"journal":{"name":"2007 IEEE/ACM/IFIP Workshop on Embedded Systems for Real-Time Multimedia","volume":"01 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124511809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}