E. Borin, F. Klein, N. Moreano, R. Azevedo, G. Araújo
{"title":"Fast instruction set customization","authors":"E. Borin, F. Klein, N. Moreano, R. Azevedo, G. Araújo","doi":"10.1109/ESTMED.2004.1359704","DOIUrl":"https://doi.org/10.1109/ESTMED.2004.1359704","url":null,"abstract":"This work proposes an approach to tune embedded processor datapaths toward a specific application, so as to maximize the application performance. We customize the computation capabilities of a base processor, by extending its instruction set to include custom operations which are implemented as new specialized functional units. We describe an automatic methodology to select the custom instructions from the given application code, in a way that there is no need of compensation code or other modifications in the application, simplifying the code generation. By using the ArchC architecture description language, fast compilation and simulation of the resulting customized processor code are achieved, considerably reducing the turnaround time required to evaluate the best set of custom operations. Experimental results show that our framework provides large performance improvements (up to 3.6 times), when compared to the base general-purpose processor, while significantly speeding up the design process.","PeriodicalId":178984,"journal":{"name":"2nd Workshop onEmbedded Systems for Real-Time Multimedia, 2004. ESTImedia 2004.","volume":"184 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116868047","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Homogeneous multiprocessing for the masses","authors":"P. Stravers","doi":"10.1109/ESTMED.2004.1359689","DOIUrl":"https://doi.org/10.1109/ESTMED.2004.1359689","url":null,"abstract":"Summary form only given. Processor architectures have reached a point where it is getting increasingly hard to improve their performance without resorting to complex and exotic measures. Polack observed in 2000 that Intel processors had been \"on the wrong side of a square law\" for almost a decade. Embedded processors for consumer and telecommunication chips are now confronted with the same rule of diminishing returns. To further improve their performance, the processors are getting disproportionally bigger and consume much more energy per operation than previous generations. Traditionally, embedded systems-on-chip (SoC) have been designed as heterogeneous multiprocessors, where most processors are not programmable and a single control processor synchronizes all communication. Obvious advantages of such systems include low cost and low power consumption. In high volume products this outweighs disadvantages like a low degree of design reuse, little software reuse, and long product lead times. Despite all the hard work and good intentions it has proved difficult to establish a platform around heterogeneous SoC architectures. With the rise of non-recurrent engineering costs and an increasingly global and competitive semiconductor market, the need for a successful SoC platform is felt stronger than ever in the industry. Next to cost, the availability of qualified engineers is often even a bigger problem. Given that it is not unusual to spend several hundreds of men years on software development for a single product, it is easy to see that even a multinational company can only have a very limited number of products in development at any point in time. The solution we propose is to move away from heterogeneous SoC and instead embrace homogeneous embedded multiprocessors. In this talk we discuss embedded multiprocessor architectures and how they relate to programming models. We contrast heterogeneous to homogeneous architectures, and we show how the traditional efficiency gap between the two is narrowing. We also discuss issues related to hardware and software reuse, and the quest for composable systems to speed up the often lengthy process of embedded system integration.","PeriodicalId":178984,"journal":{"name":"2nd Workshop onEmbedded Systems for Real-Time Multimedia, 2004. ESTImedia 2004.","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134310661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Trace-based evaluation of clock synchronization algorithms for wireless loudspeakers","authors":"P. Blum, L. Thiele","doi":"10.1109/ESTMED.2004.1359693","DOIUrl":"https://doi.org/10.1109/ESTMED.2004.1359693","url":null,"abstract":"We present an evaluation strategy for clock synchronization algorithms. It is based on a combination of measured traces, which provide for realistic performance estimation, and of simulation, which guarantees repeatability. The evaluation strategy includes parameter-optimization to allow for a fair comparison of algorithms; a general-purpose evolutionary optimizer is used for this purpose. The strategy is applied in a case study, evaluating the performance of four clock synchronization algorithms in the wireless loudspeakers application. We find that the phase-locked loop algorithm, as well as the linear-regression and the gradient algorithm achieve sufficient synchronization in a lightly loaded network. Only the local selection algorithm is able to maintain sufficient synchronization under heavy network load, as generated for example by concurrent audio or video streaming.","PeriodicalId":178984,"journal":{"name":"2nd Workshop onEmbedded Systems for Real-Time Multimedia, 2004. ESTImedia 2004.","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122408926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Power saving in hand-held multimedia systems using MPEG-21 digital item adaptation","authors":"Hojun Shim, Youngjin Cho, N. Chang","doi":"10.1109/ESTMED.2004.1359694","DOIUrl":"https://doi.org/10.1109/ESTMED.2004.1359694","url":null,"abstract":"The MPEG-21 Multimedia Framework initiative aims to support a wide range of networks and devices in the delivery and consumption of multimedia resources. One of the primary goals of MPEG-21 is universal multimedia access (UMA) through Digital Item Adaptation (DIA), which supports multimedia streaming to heterogeneous terminal devices ensuring the same readability and seamlessness. We pioneer power saving of terminal devices with MPEG-21 DIA, so that the MPEG-21 DIA can also be used to support power saving, even though the framework is not primarily designed for power reduction and only limited power awareness is defined by DIA. We introduce several power-saving techniques conforming to MPEG-21 DIA specifications and show the dependency relation among introduced techniques. We achieve energy savings of up to 66% in hand-held multimedia devices with minor QoS (Quality of Service) degradation.","PeriodicalId":178984,"journal":{"name":"2nd Workshop onEmbedded Systems for Real-Time Multimedia, 2004. ESTImedia 2004.","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131238072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Seongnam Kwon, Choonseung Lee, Sungchan Kim, Youngmin Yi, S. Ha
{"title":"Fast design space exploration framework with an efficient performance estimation technique","authors":"Seongnam Kwon, Choonseung Lee, Sungchan Kim, Youngmin Yi, S. Ha","doi":"10.1109/ESTMED.2004.1359698","DOIUrl":"https://doi.org/10.1109/ESTMED.2004.1359698","url":null,"abstract":"This work presents the design space exploration framework that consists of two design loops: cosynthesis loop for component selection and mapping of the function blocks to the processing components, and communication DSE loop for communication architecture optimization. Before entering into the cosynthesis loop, it is critical to estimate the performance of junction blocks. We also propose a performance estimation method of software function blocks considering the effect of architecture variation, compiler optimization, and data dependent behavior. It is to run the entire application with code augmentation on the instruction set simulator of the target processor. In the cosynthesis loop, the performance of the entire application is easily computed as a linear combination Of function block performance values. Experimentation with real-life applications proves the viability of the proposed technique.","PeriodicalId":178984,"journal":{"name":"2nd Workshop onEmbedded Systems for Real-Time Multimedia, 2004. ESTImedia 2004.","volume":"91 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127120298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Adaptive spectrum-based variable bit truncation of discrete cosine transform (DCT) for energy-efficient wireless multimedia communication","authors":"Feng Liu, C. Tsui","doi":"10.1109/ESTMED.2004.1359712","DOIUrl":"https://doi.org/10.1109/ESTMED.2004.1359712","url":null,"abstract":"This work presents a new adaptive scheme to reduce the computation energy of the discrete cosine transform (DCT) architecture for image/video coding. The scheme employs the noise masking effect of quantization and the spectral difference of the quantization factors. Spectrum-based variable bit truncation is used to allocate more energy for the computations that affect the low frequency DCT coefficients more. We propose a benchmark driven search mechanism based on energy and distortion weight to find the optimal truncation sets for different quality constraints and quantization tables. They are stored in a look-up table (LUT) for the on-line reconfiguration. Simulation results show that significant energy saving is achieved with negligible quality degradation.","PeriodicalId":178984,"journal":{"name":"2nd Workshop onEmbedded Systems for Real-Time Multimedia, 2004. ESTImedia 2004.","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121666592","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Identifying \"representative\" workloads in designing MpSoC platforms for media processing","authors":"A. Maxiaguine, S. Chakraborty, Wei Tsang Ooi","doi":"10.1109/ESTMED.2004.1359702","DOIUrl":"https://doi.org/10.1109/ESTMED.2004.1359702","url":null,"abstract":"Workload design is a well recognized problem in the domain of microprocessor design. Different program characteristics that influence the selection of a representative workload include microarchitecture-centric properties such as cache miss rates, instruction mix and accuracy of branch prediction. However, properties of a workload that are pertinent to the context of system-level design of multiprocessor SoC platforms are very different. Till date, the problem of \"representative workload design\" in this specific context has not been sufficiently addressed. This work represents an attempt to address this problem in the specific case of SoC platform design for multimedia processing. Towards this, we present a method to characterize properties of multimedia workload that are relevant to SoC platform design. Based on such a characterization, we present a technique for classifying different multimedia streams. Finally, we show the utility of such a classification through a case study involving the design of a multiprocessor SoC platform for MPEG-2 decoding.","PeriodicalId":178984,"journal":{"name":"2nd Workshop onEmbedded Systems for Real-Time Multimedia, 2004. ESTImedia 2004.","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132084861","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}