Francesco Paterna, A. Acquaviva, Francesco Papariello, G. Desoli, L. Benini
{"title":"Variability-tolerant workload allocation for MPSoC energy minimization under real-time constraints","authors":"Francesco Paterna, A. Acquaviva, Francesco Papariello, G. Desoli, L. Benini","doi":"10.1145/2362336.2362338","DOIUrl":"https://doi.org/10.1145/2362336.2362338","url":null,"abstract":"Sub-50nm CMOS technologies are affected by significant variability which causes power and performance variations among nominally similar cores in MPSoC platforms. This undesired heterogeneity threatens execution predictability and energy efficiency. We propose two techniques to allocate sets of barrier-synchronized tasks (representative of a wide class of image processing workloads) onto variability-affected MPSoCs. The first technique models allocation as an ILP and achieves optimal results, but requires an off-line solver. The second techniques adopt a two-stage heuristic approach, and it can be adapted to work on-line. We tested our approach on the virtual prototype of a next-generation industrial multi-core platform. Experimental results demonstrate that our approach minimizes deadline violations while increasing energy efficiency.","PeriodicalId":104499,"journal":{"name":"2009 IEEE/ACM/IFIP 7th Workshop on Embedded Systems for Real-Time Multimedia","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128229626","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A high-performance low-power H.264/AVC video decoder accelerator for embedded systems","authors":"Huang-Chih Kuo, Jian-Wen Chen, Y. Lin","doi":"10.1109/ESTMED.2009.5336823","DOIUrl":"https://doi.org/10.1109/ESTMED.2009.5336823","url":null,"abstract":"We present a high-performance and low-power pure-hardware accelerator for decoding H.264/AVC video. We propose novel VLSI architectures for every stage of the decoding pipeline. We wrap the decoder core with an AMBA bus interface, integrate it into a multimedia SOC platform, and verify it with FPGA prototyping. In order to reduce external memory traffic, we propose a memory fetch unit to increase the length of burst access. Running at a 16 MHz, our FPGA decoder prototype can real-time decode D1 video (720×480) at 30 fps. We also propose several techniques to reduce both average and peak power consumption. Simulation result shows that our design consumes only 21.2 mW of average power. The proposed H.264/AVC video decoder is suitable for embedded multimedia systems for mobile applications.","PeriodicalId":104499,"journal":{"name":"2009 IEEE/ACM/IFIP 7th Workshop on Embedded Systems for Real-Time Multimedia","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126594188","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Design space exploration for optimal memory mapping of data and instructions in multimedia applications to Scratch-Pad Memories","authors":"A. Iranpour, K. Kuchcinski","doi":"10.1109/ESTMED.2009.5336826","DOIUrl":"https://doi.org/10.1109/ESTMED.2009.5336826","url":null,"abstract":"In this paper, we propose a new methodology for optimal memory mapping of data and instructions to Scratch-Pad Memories (SPM). In the mapping process, we optimize, as the main priority, the number of memory accesses to minimize power consumption. Minimization of external memory accesses lowers switching activity and therefore power consumption. The optimization is done by finding Pareto-points, using multi-objective optimization that combines different cost functions. Our methodology is intended to be used in real-life situations in industry where there is often a need for mapping third party applications to a specific architecture. For evaluating our methodology, we also use commercial video H.264 and audio eAAC+ applications. Our experiments show that SPM is well suited for these applications for reducing external accesses to reduce power consumption but has limited significance on overall performance improvements. The proposed methodology provides a way to combine SPMs with caches to optimally use this memory architecture. Our experiments indicate high accuracy of our methodology for predicting SPM and external memory accesses. We have obtained 90% accuracy between results of our methodology and results for executing applications on a given architecture.","PeriodicalId":104499,"journal":{"name":"2009 IEEE/ACM/IFIP 7th Workshop on Embedded Systems for Real-Time Multimedia","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122574801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"QoS management of dynamic video tasks by task splitting and skipping","authors":"R. Albers, Eric Suijs, P. D. With","doi":"10.1109/ESTMED.2009.5336827","DOIUrl":"https://doi.org/10.1109/ESTMED.2009.5336827","url":null,"abstract":"We have integrated processing with deterministic and non-deterministic resource usage in an overall application and evaluated its performance on a multi-core processor platform. The non-determinism involves image analysis, which features a high variation in computing and memory requirements, as opposed to regular stream-oriented video processing. Quality-of-Service (QoS) control is based on resource-usage estimation functions. Scalability in parallel executing sub-functions is achieved by using task skipping and splitting as a concept, as every video application can be quickly made scalable in this way. The complete framework was validated for accurate latency control of an interactive medical imaging application. The proposed QoS mechanism runs fast enough to be executed in real time, and we achieved a reduction on the latency jitter of almost 70% for average-case processing, so that the quality can be significantly improved or an inexpensive hardware system can be employed.","PeriodicalId":104499,"journal":{"name":"2009 IEEE/ACM/IFIP 7th Workshop on Embedded Systems for Real-Time Multimedia","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124071984","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A high-throughput, area-efficient hardware accelerator for adaptive deblocking filter in H.264/AVC","authors":"M. Nadeem, Stephan Wong, G. Kuzmanov, A. Shabbir","doi":"10.1109/ESTMED.2009.5336814","DOIUrl":"https://doi.org/10.1109/ESTMED.2009.5336814","url":null,"abstract":"In this paper, we present a high-throughput, area-efficient, hardware accelerator for the deblocking filter in H.264/AVC video compression standard. In order to achieve this goal, we start with algorithmic optimization and propose a novel decomposition of the filter kernels for the deblocking filter. The proposed decomposition reduces the number of adders by 51% and thereby greatly reduces the area requirement for its implementation. Subsequently, at architecture level, while using two identical filtering units, the transpose units are realized by efficient reuse of hardware resources to further reduce the area requirement. The two filtering units process the horizontal and vertical edges of the macro-block simultaneously and therefore further enhance the throughput of the hardware accelerator. Several other optimization techniques, such as reuse of intermediate results, pipelining, and merging of processing blocks on critical path, result in a hardware accelerator for deblocking filter with high throughput at one hand and less area in terms of equivalent gates count on the other, when compared with existing state-of-the-art hardware accelerators in the literature. While working at clock frequency of 166 MHz, synthesized under 0.18 µm CMOS standard cell technology, it easily meets the throughput requirements of all the levels in H.264/AVC video coding standard and consumes only 12.06 K gates (excluding SRAM).","PeriodicalId":104499,"journal":{"name":"2009 IEEE/ACM/IFIP 7th Workshop on Embedded Systems for Real-Time Multimedia","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115666462","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"System-level MP-SoC design space exploration using tree visualization","authors":"T. Taghavi, A. Pimentel, M. Thompson","doi":"10.1109/ESTMED.2009.5336816","DOIUrl":"https://doi.org/10.1109/ESTMED.2009.5336816","url":null,"abstract":"The complexity of today's embedded systems forces designers to model and simulate systems and their components to explore the wide range of design choices. Such design space exploration is especially needed during the early design stages, where the design space is at its largest. Due to the exponential design space in real problems, evaluating and comparing every single point in the design space is infeasible. Therefore, heuristic search techniques, such as Evolutionary Algorithms (EA), are often used to search the design space for optimum design points using only a finite number of design-point evaluations. Understanding how the design space was searched by such searching algorithms and providing insight into the “landscape” of the design space, may be of invaluable importance to the designer, To this end, this paper presents a novel interactive visualization application, based on tree visualization, to understand the search dynamics of an evolutionary algorithm and to visualize where the optimum design points are located in the design space.","PeriodicalId":104499,"journal":{"name":"2009 IEEE/ACM/IFIP 7th Workshop on Embedded Systems for Real-Time Multimedia","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127796469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}