{"title":"An Optimized Application Architecture of the H.264 Video Encoder for Application Specific Platforms","authors":"M. Shafique, L. Bauer, J. Henkel","doi":"10.1109/ESTMED.2007.4375816","DOIUrl":"https://doi.org/10.1109/ESTMED.2007.4375816","url":null,"abstract":"The H.264 video coding standard features diverse computational hot spots that need to be accelerated to cope with the significantly increased complexity compared to previous standards. In this paper, we propose an optimized application architecture for the H.264 encoder with reduced processing and which is suitable for application specific (reconfigurable) hardware platforms. Our proposed application architecture optimization for the computational amount of the Motion Compensation (MC) is independent of the actual hardware platform that is used for execution. For a MIPS processor we achieve an average speed-up of approx. 60x for MC. Our proposed application architecture reduces the overhead for Reconfigurable Platforms by distributing the actual hardware requirements amongst the functional blocks. This increases the amount of available reconfigurable hardware per data path (within a functional block) which leads to a 2.84x performance improvement. We evaluate our application architecture by means of four different hardware platforms.","PeriodicalId":428196,"journal":{"name":"2007 IEEE/ACM/IFIP Workshop on Embedded Systems for Real-Time Multimedia","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127576884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Embedded Low Complexity JPEG2000 Videocoding System","authors":"A. Schuchter","doi":"10.1109/ESTMED.2007.4375807","DOIUrl":"https://doi.org/10.1109/ESTMED.2007.4375807","url":null,"abstract":"In this paper, we discuss an embedded hardware low complexity JPEG2000 video coding system. The hardware implementation is based on a software simulation system, where temporal redundancy is exploited by coding of differential frames which are arranged in an adaptive GOP structure. The hardware used mainly consists of a microprocessor (Analog Devices ADSP- BF533 Blackfin Processor) and a JPEG2000 chip (ADV202). DMAs (direct memory access) are introduced to optimize memory transfers in the system. It is shown that memory to memory DMAs lead to significant improvement in our proposed memory structure resulting in better performance.","PeriodicalId":428196,"journal":{"name":"2007 IEEE/ACM/IFIP Workshop on Embedded Systems for Real-Time Multimedia","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128960426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Christos Baloukas, Lazaros Papadopoulos, S. Mamagkakis, D. Soudris
{"title":"Component Based Library Implementation of Abstract Data Types for Resource Management Customization of Embedded Systems","authors":"Christos Baloukas, Lazaros Papadopoulos, S. Mamagkakis, D. Soudris","doi":"10.1109/ESTMED.2007.4375812","DOIUrl":"https://doi.org/10.1109/ESTMED.2007.4375812","url":null,"abstract":"In this paper, we present a component based abstract data type library that can handle efficiently the dynamic data access and storage requirements of complex multimedia applications. The proposed library exhibits high levels of modularity and extensibility in its design, which allows it to be customized according to the very specific characteristics of the targeted embedded software (i.e., Software Metadata) and underlying hardware platform (i.e., Hardware Metadata). We developed more than 30 components, which can be accessed by the embedded system designer through an interface, which is very similar to the STL abstract data type interface. These components are combined automatically with our prototype tool and thus provide highly customized container implementations matching the requirements of the software applications and the available hardware resources. The applicability of our proposed framework is evaluated on different real-life complex dynamic applications.","PeriodicalId":428196,"journal":{"name":"2007 IEEE/ACM/IFIP Workshop on Embedded Systems for Real-Time Multimedia","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121551090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Actor-Oriented Modeling and Simulation of Sliding Window Image Processing Algorithms","authors":"J. Keinert, J. Falk, C. Haubelt, J. Teich","doi":"10.1109/ESTMED.2007.4375815","DOIUrl":"https://doi.org/10.1109/ESTMED.2007.4375815","url":null,"abstract":"Embedded real-time image processing systems have to process huge amounts of data with limited resources and energy. Hence high efficiency is not only required for manual, but also for automatic system generation. Therefore, in order to allow for different optimizations, a system specification must be such that important algorithm properties are accessible to the system design software. In this paper, we present a new method how multidimensional image processing algorithms can be modeled by actor-oriented dataflow semantics. Using the example of a binary morphological reconstruction, we investigate the modeling requirements posed by point, local and global image processing algorithms. We show how they can be taken into account in our approach, so that efficient implementation and analysis in terms of buffer size and throughput is possible. In particular, by the explicit specification of the communication behavior, both static and data dependent algorithms are supported allowing for a complete system specification.","PeriodicalId":428196,"journal":{"name":"2007 IEEE/ACM/IFIP Workshop on Embedded Systems for Real-Time Multimedia","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115938445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Modular Suite for High-Definition Image Processor Co-Verification","authors":"M. Bertola, R. Irvine","doi":"10.1109/ESTMED.2007.4375817","DOIUrl":"https://doi.org/10.1109/ESTMED.2007.4375817","url":null,"abstract":"High-end image processors are complex hardware and software systems. The verification of such devices poses a number of problems during the different phases of their life cycle. This paper proposes a modular verification suite that reuses the same test cases at every stage of the system's life cycle. It uses models of increasing fidelity with a common set of verification tools to provide consistent verification coverage. Because of the feedback that can be provided by its modular design, the suite improves continuously and can be used to increase the verification coverage for future designs.","PeriodicalId":428196,"journal":{"name":"2007 IEEE/ACM/IFIP Workshop on Embedded Systems for Real-Time Multimedia","volume":"1 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132624517","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
V. Ferentinos, B. Geelen, F. Catthoor, G. Lafruit, T. Stouraitis, R. Lauwereins, D. Verkest
{"title":"Adaptive mapping to resource availability for dynamic wavelet-based applications","authors":"V. Ferentinos, B. Geelen, F. Catthoor, G. Lafruit, T. Stouraitis, R. Lauwereins, D. Verkest","doi":"10.1109/ESTMED.2007.4375802","DOIUrl":"https://doi.org/10.1109/ESTMED.2007.4375802","url":null,"abstract":"Platforms have to cope with unpredictably varying system resource requirements, because of inter-task level dynamism. To deal with this, they have to be at least partially reconfigurable. It is then important for applications to optimally exploit the memory hierarchy under varying memory availability. Moreover, in the case of intra-task dynamism, additional unpredictability is inserted and the exploration of the optimal memory hierarchy depends on data dependent application behavior. This paper presents a mapping strategy for a wavelet-based video application: depending on the encountered resource availability, switching to different memory optimized instantiations (i.e. localizations) of the application offers up to 25% energy gains in memory accesses, for a representative test sequence. We observe that it is possible to exploit the input motion characteristics in detail (that reflect the intra-task dynamism) by enabling motion-triggered switching, and further increase the achieved gains.","PeriodicalId":428196,"journal":{"name":"2007 IEEE/ACM/IFIP Workshop on Embedded Systems for Real-Time Multimedia","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131151421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Leveraging Predicated Execution for Multimedia Processing","authors":"D. Ebner, F. Brandner, A. Krall","doi":"10.1109/ESTMED.2007.4375809","DOIUrl":"https://doi.org/10.1109/ESTMED.2007.4375809","url":null,"abstract":"Modern compression standards such as H.264, DivX, or VC-1 provide astonishing quality at the costs of steadily increasing processing requirements. Therefore, efficient solutions for mobile multimedia devices have to effectively leverage instruction level parallelism (LLP), which is often achieved by the deployment of EPIC (explicitly parallel instruction computing) architectures. A characteristical architectural feature to increase the available ILP in the presence of control flow is predicated execution. Compilers targeting those hardware platforms are responsible to carefully convert control flow into conditional/predicated instructions - a process called if-conversion. We describe an effective if-conversion algorithm for the CHILI - a novel hardware architecture specifically designed for digital video processing and mobile multimedia consumer electronic. Several architectural characteristics such as the lack of branch prediction units, large delay slots, and the provided predication model are significantly different from previous work, typically aiming mainstream architectures such as Intel Itanium. The algorithm has been implemented for an optimizing compiler based on LLVM. Experimental results using a cycle accurate simulator for the well known benchmark suite MiBench and several multimedia codecs show a speed improvement of about 18% on average. On the same programs, our compiler achieves a speedup of 21% in comparison to the existing code generator based on gcc.","PeriodicalId":428196,"journal":{"name":"2007 IEEE/ACM/IFIP Workshop on Embedded Systems for Real-Time Multimedia","volume":"106 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128180580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Signature-based Microprocessor Power Modeling for Rapid System-level Design Space Exploration","authors":"P. V. Stralen, A. Pimentel","doi":"10.1109/ESTMED.2007.4375798","DOIUrl":"https://doi.org/10.1109/ESTMED.2007.4375798","url":null,"abstract":"This paper presents a technique for high-level power estimation of microprocessors. The technique, which is based on abstract execution profiles called 'event signatures', operates at a higher level of abstraction than commonly-used instruction-level power simulators and should thus be capable of achieving good evaluation performance. We have compared our power estimation results to those from the instruction-level simulator Wattch. In these experiments, we demonstrate that with a good underlying power model, the signature-based power modeling technique can yield accurate estimations (a mean error of 5.5 percent compared to Wattch in our experiments). At the same time, the power estimations based on our event signature technique are at least an order of magnitude faster than with Wattch.","PeriodicalId":428196,"journal":{"name":"2007 IEEE/ACM/IFIP Workshop on Embedded Systems for Real-Time Multimedia","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123889303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
V. Ramadurai, S. Jinturkar, M. Moudgill, C. Glossner
{"title":"Design and Implementation of a Multithreaded High Resolution MPEG4 Decoder on Sandblaster DSP","authors":"V. Ramadurai, S. Jinturkar, M. Moudgill, C. Glossner","doi":"10.1109/ESTMED.2007.4375806","DOIUrl":"https://doi.org/10.1109/ESTMED.2007.4375806","url":null,"abstract":"In this paper, we describe the design, implementation and multithreading of a MPEG 4 decoder (simple profile) for high resolution (VGA 640times480) on Sandblaster DSP. The implementation is done entirely in C. Software solution provides high reusability, low cost and short development time when compared to dedicated hardware solutions. We describe the multithreading of time critical tasks that are processor intensive as well as memory intensive.","PeriodicalId":428196,"journal":{"name":"2007 IEEE/ACM/IFIP Workshop on Embedded Systems for Real-Time Multimedia","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123970167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Data-Parallel Code Generation from Synchronous Dataflow Specification of Multimedia Applications","authors":"Seongnam Kwon, Choonseung Lee, S. Ha","doi":"10.1109/ESTMED.2007.4375810","DOIUrl":"https://doi.org/10.1109/ESTMED.2007.4375810","url":null,"abstract":"Embedded software design for MPSoC needs parallel programming. Popular programming languages such as C and C++ are not adequate for initial specification since they are designed for sequential execution. Therefore models of computations that express concurrency naturally are preferred for initial specification, among which dataflow model has been widely used to specify signal processing applications. While software generation from SDF specification has been researched extensively, data- parallelism has not been properly considered in the previous work. This paper presents data-parallel code generation technique from SDF graphs. We use OpenMP directives to specify data-parallelism and resort OpenMP compiler to obtain the final target code. Preliminary experimentation with real-life examples shows the viability of the proposed technique.","PeriodicalId":428196,"journal":{"name":"2007 IEEE/ACM/IFIP Workshop on Embedded Systems for Real-Time Multimedia","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129886209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}