{"title":"Design of multimillion-gate multimedia SoCs: where do we stand?","authors":"S. Dutta","doi":"10.1109/ESTMED.2005.1518056","DOIUrl":"https://doi.org/10.1109/ESTMED.2005.1518056","url":null,"abstract":"Summary form only given. The paper essentially investigates how consumer demands and market dynamics continue to influence the features, the designs, and the design methodology. The paper has four main parts. Starting in the first part with how the power consumers participate in the digital revolution, the second part of the paper focuses on the market dynamics and the market demands. The third part of the paper identifies the high-level design trends as influenced by the consumers and the market. The fourth and final part of the paper describes the low-level design trends and draws the conclusions.","PeriodicalId":119898,"journal":{"name":"3rd Workshop on Embedded Systems for Real-Time Multimedia, 2005.","volume":"161 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133468592","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Combining data and instruction memory energy optimizations for embedded applications","authors":"T. Aa, F. Catthoor, H. Corporaal, Geert Deconinck","doi":"10.1109/ESTMED.2005.1518088","DOIUrl":"https://doi.org/10.1109/ESTMED.2005.1518088","url":null,"abstract":"This paper studies the overhead of data memory optimizations on the instruction memories in embedded processors. First, it is shown that this overhead is significant, but methods exist to alleviate it. For every data optimization step causing overhead, there exists an appropriate countermeasure such that both instruction and data energy is kept low. Results on two driver applications show that although the overhead in both energy and performance can reach up to 250%, the countermeasures reduce this, such that the final result has up to four times less energy consumption and up seven times better performance compared to the original implementation.","PeriodicalId":119898,"journal":{"name":"3rd Workshop on Embedded Systems for Real-Time Multimedia, 2005.","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124419200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A NUCA model for embedded systems cache design","authors":"P. Foglia, Daniele Mangano, C. Prete","doi":"10.1109/ESTMED.2005.1518068","DOIUrl":"https://doi.org/10.1109/ESTMED.2005.1518068","url":null,"abstract":"Embedded applications require high performance processors integrating fast and low-power cache. Dynamic non-uniform cache architectures (D-NUCA) have been proposed to overcome the performance limit introduced by wire delays when designing large cache. In this paper, we propose alternative designs of D-NUCA cache, namely triangular D-NUCA cache, to reduce power consumption and silicon area occupancy of D-NUCA cache. We compare the performances of triangular D-NUCA cache with conventional rectangular organization. Results show that our approach is particular useful in the embedded applications domain, as it permits the utilization of half-sized NUCA cache with performance improvements.","PeriodicalId":119898,"journal":{"name":"3rd Workshop on Embedded Systems for Real-Time Multimedia, 2005.","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130869252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Frame buffer compression using a limited-size code book for low-power display systems","authors":"Hojun Shim, Youngjin Cho, N. Chang","doi":"10.1109/ESTMED.2005.1518059","DOIUrl":"https://doi.org/10.1109/ESTMED.2005.1518059","url":null,"abstract":"Modern hand-held multimedia terminals consume significant power for their quality display devices. Due to 60Hz or higher LCD refresh operations, frame buffer memory and related buses become dominant power consumers. In this paper, we introduce an efficient frame buffer compression scheme that uses differential Huffman coding and its hardware implementation. The compression and decompression must be simple and not incur distinct power overhead involving no CPU operations. We have achieved both on-the-fly compression and high compression efficiency devising a limited-size code book, color-difference reduction techniques and an adaptive code book update scheme. On the MobileMark 2002 benchmark, our techniques reduce the frame buffer activity by 52% to 90%, saving up to 86mW including the overhead.","PeriodicalId":119898,"journal":{"name":"3rd Workshop on Embedded Systems for Real-Time Multimedia, 2005.","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126591219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A perception-aware low-power software audio decoder for portable devices","authors":"S. Chakraborty, Ye Wang, Wendong Huang","doi":"10.1109/ESTMED.2005.1518060","DOIUrl":"https://doi.org/10.1109/ESTMED.2005.1518060","url":null,"abstract":"We propose a new software audio decoder for processors supporting multiple discrete voltage-frequency operating points. The proposed decoding scheme allows the user to switch between multiple output quality levels, where each level is associated with a different rate at which the processor consumes energy. This is an attractive feature in battery-powered portable audio players and mobile phones, where battery-life is often more crucial than the output quality, especially in noisy environments. Towards this, the frequency range of the decoder is partitioned into multiple groups, in accordance with their perceptual relevance. When a longer battery life is desired, only the most relevant frequency components are decoded, which allows the processor to be run at a lower voltage and frequency. We have implemented this scheme using the MP3 decoder and obtained up to 95% savings in the energy consumed by the processor for AM quality output (in contrast to CD quality output, which is associated with the maximum energy consumption). This scheme is easy to implement, has no runtime overhead and does not involve any runtime voltage or frequency scaling.","PeriodicalId":119898,"journal":{"name":"3rd Workshop on Embedded Systems for Real-Time Multimedia, 2005.","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130538533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Custom processor design using NISC: a case-study on DCT algorithm","authors":"B. Gorjiara, D. Gajski","doi":"10.1109/ESTMED.2005.1518072","DOIUrl":"https://doi.org/10.1109/ESTMED.2005.1518072","url":null,"abstract":"Designing application-specific instruction-set processors (ASIPs) usually requires designing a custom datapath, and modifying instruction-set, instruction decoder, and compiler. A new alternative to ASIPs is no-instruction-set-computers (NISCs) that eliminate the instruction abstraction by compiling programs directly to a given datapath. The compiler analyzes the datapath and extracts possible operations and data flows. The NISC approach simplifies and accelerates the task of custom processor design. In this paper, we present a case-study of designing a custom datapath for a 2D DCT algorithm. We applied several optimization techniques such as software transformations, operation chaining, datapath pipelining, controller pipelining, and functional unit customization to improve the quality of the design. Most of the techniques are general and can be applied to other applications. The result of synthesizing our final custom datapath on a Xilinx FPGA shows 7.14 times performance improvement, 1.64 times power reduction, 12.5 times energy savings, and more than 3 times area reduction compared to a softcore MIPS implementation.","PeriodicalId":119898,"journal":{"name":"3rd Workshop on Embedded Systems for Real-Time Multimedia, 2005.","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125210688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Addressing computational and networking constraints to enable video streaming from wireless appliances","authors":"Saumya Chandra, S. Dey","doi":"10.1109/ESTMED.2005.1518064","DOIUrl":"https://doi.org/10.1109/ESTMED.2005.1518064","url":null,"abstract":"Enabling real-time video streaming from a wireless appliance requires compute intensive video compression to be performed in real-time on the appliance before transmitting the data upstream. However, the tasks of real-time video encoding and streaming from the wireless appliances are challenging due to a) limited computational and battery resources, and b) limited and time-varying network bandwidth availability. In this paper, we present a technique for enabling real-time video compression and transmission from wireless appliances based on run-time video adaptation. We present an adaptation engine for dynamic selection of video compression parameters such that both the computational and the network bandwidth constraints are satisfied, while maximizing the end user's viewing quality. The algorithm is based on the analysis of the effect of different video compression parameters on computational and network resource usage, and the video quality. Since our approach is based on judicious selection of video compression parameters and does not require changes to the compression algorithm itself, it is applicable to a wide range of video compression standards. We have also developed an iPAQ-based end-to-end video streaming system to evaluate our approach. Experiments conducted on this test-bed indicate that our proposed technique achieves significant improvements in overall video quality under computation (up to 4/spl times/) and network bandwidth (/spl sim/3dB) constraints. We also show significant improvements in the energy efficiency as a result of adaptation.","PeriodicalId":119898,"journal":{"name":"3rd Workshop on Embedded Systems for Real-Time Multimedia, 2005.","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122744500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yanhong Liu, S. Chakraborty, Wei Tsang Ooi, A. Gupta, Subramanian Mohan
{"title":"Workload characterization and cost-quality tradeoffs in MPEG-4 decoding on resource-constrained devices","authors":"Yanhong Liu, S. Chakraborty, Wei Tsang Ooi, A. Gupta, Subramanian Mohan","doi":"10.1109/ESTMED.2005.1518091","DOIUrl":"https://doi.org/10.1109/ESTMED.2005.1518091","url":null,"abstract":"There has been a lot of interest in the embedded systems community on architectures and design methods that are targeted towards multimedia applications. This trend is pri marily motivated by the proliferation of resource and power-constrained portable devices (such as mobile phones and PDAs), a major portion of whose workload is made up of multimedia applications. In this paper, we investigate the tradeoffs between video quality and the processor cycle requirements in such resource-constrained devices, in the particular context of MPEG-4 decoding using an open source codec called XviD. The XviD codec implements a number of powerful coding tools from MPEG-4, which are organized as profiles and levels. Given the specification of an architecture on which an XviD decoder is implemented, the work presented here would guide a multimedia applications developer in selecting appropriate profiles and levels for the corresponding encoder application. While the selection of such profiles has so far been primarily influenced by the network bandwidth in the case of video streaming, our work stresses the importance of additionally taking into account the architecture of the device running the decoder application. Although the relevance of this observation is increasingly being realized, sufficient work has not yet been done to provide guidelines on how to systematically make such selections. This work attempts to address this shortcoming.","PeriodicalId":119898,"journal":{"name":"3rd Workshop on Embedded Systems for Real-Time Multimedia, 2005.","volume":"266 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115208080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Data-access optimization of embedded systems through selective inlining transformation","authors":"J. Absar, P. Marchal, F. Catthoor","doi":"10.1109/ESTMED.2005.1518077","DOIUrl":"https://doi.org/10.1109/ESTMED.2005.1518077","url":null,"abstract":"Spatial and temporal locality improvements, by loop and data-layout transformations, are today seen as key to obtaining energy and performance gains in data-dominated multimedia applications. With significant energy and time being spent in moving data between memory hierarchies, reducing the traffic by increasing locality is highly desirable. Techniques for improving locality can, however, are significantly constrained by function boundaries. While global optimization - across function boundaries - is often considered an alternative, the truth is that it does not allow contextual and specialized optimizations that are possible after inlining. In this paper, therefore, we present a systematic technique which assists locality optimizations techniques by selectively inlining functions that have strong data coupling between them. Results on realistic multimedia applications using our approach show an average 35% reduction in external data memory access, without any significant impact on the instruction memory (within 3%).","PeriodicalId":119898,"journal":{"name":"3rd Workshop on Embedded Systems for Real-Time Multimedia, 2005.","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132167115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Customizing 16-bit floating point instructions on a NIOS II processor for FPGA image and media processing","authors":"D. Etiemble, S. Bouaziz, L. Lacassagne","doi":"10.1109/ESTMED.2005.1518073","DOIUrl":"https://doi.org/10.1109/ESTMED.2005.1518073","url":null,"abstract":"We have implemented customized SIMD 16-bit floating point instructions on a NIOS II processor. On several image processing and media benchmarks for which the accuracy and dynamic range of this format is sufficient, a speed-up ranging from 1.5 to more than 2 is obtained versus the integer implementation. The hardware overhead remains limited and is compatible with the capacities of today's FPGAs.","PeriodicalId":119898,"journal":{"name":"3rd Workshop on Embedded Systems for Real-Time Multimedia, 2005.","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130869555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}