{"title":"Translating data flow to synchronous block diagrams","authors":"Roberto Lublinerman, S. Tripakis","doi":"10.1109/ESTMED.2008.4697005","DOIUrl":"https://doi.org/10.1109/ESTMED.2008.4697005","url":null,"abstract":"We propose a method to automatically transform synchronous data flow diagrams into synchronous block diagrams. The idea is to use triggers, a mechanism that allows a block to be fired only at selected times. We discuss how to extend the transformation to also cover dynamic data flow diagrams where the number of tokens produced and consumed by blocks is variable. Our method allows widespread tools such as Simulink which are based on the synchronous block diagram model to be used for data flow diagrams as well.","PeriodicalId":165969,"journal":{"name":"2008 IEEE/ACM/IFIP Workshop on Embedded Systems for Real-Time Multimedia","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114387878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
V. Rana, M. Matteucci, D. Caltabiano, R. Sannino, Andrea Bonarini
{"title":"Low cost smartcams design","authors":"V. Rana, M. Matteucci, D. Caltabiano, R. Sannino, Andrea Bonarini","doi":"10.1109/ESTMED.2008.4696990","DOIUrl":"https://doi.org/10.1109/ESTMED.2008.4696990","url":null,"abstract":"Nowadays, digital image processing is the most common form of image processing. Digital image processing makes it possible to enhance image features of interest while attenuating detail irrelevant to a given application, and then extract useful information about the scene from the enhanced image. For instance, digital cameras usually include dedicated digital image processing chips in order to improve at real-time the quality of images directly on-board. This paper proposes a very low cost vision system that is able to perform image processing tasks with a wide set of digital image processing algorithms, that have been specifically optimized for the proposed architecture. Some of these algorithms can be used to detect and to track a set of previously acquired targets. Finally, the proposed solution has been proved to be an effective solution even when applied to real-world case studies, such as the detection and the tracking of test-tubes in a medical environment.","PeriodicalId":165969,"journal":{"name":"2008 IEEE/ACM/IFIP Workshop on Embedded Systems for Real-Time Multimedia","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133338630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MARTE based modeling approach for Partial Dynamic Reconfigurable FPGAs","authors":"I. Quadri, S. Meftali, J. Dekeyser","doi":"10.1109/ESTMED.2008.4696994","DOIUrl":"https://doi.org/10.1109/ESTMED.2008.4696994","url":null,"abstract":"As System-on-Chip (SoC) architectures become pivotal for designing embedded systems, the SoC design complexity continues to increase exponentially necessitating the need to find new design methodologies. In this paper we present a novel SoC co-design methodology based on Model Driven Engineering using the MARTE (Modeling and Analysis of Real-time and Embedded Systems) standard. This methodology is utilized to model fine grain reconfigurable architectures such as FPGAs and extends the standard to integrate new features such as Partial Dynamic Reconfiguration supported by modern FPGAs. The goal is to carry out modeling at a high abstraction level expressed in UML (Unified Modeling Language) and following transformations of these models, automatically generate the code necessary for FPGA implementation.","PeriodicalId":165969,"journal":{"name":"2008 IEEE/ACM/IFIP Workshop on Embedded Systems for Real-Time Multimedia","volume":"385 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116485845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Storage optimization through code size reduction for digital signal processors","authors":"Hassan A. Salamy, J. Ramanujam","doi":"10.1109/ESTMED.2008.4697006","DOIUrl":"https://doi.org/10.1109/ESTMED.2008.4697006","url":null,"abstract":"Most modern digital signal processors (DSPs) provide multiple address registers and a dedicated address generation unit (AGU) which performs address generation in parallel to instruction execution. There is no address computation overhead if the next address is within the auto-modify range. A careful placement of variables in memory is utilized to decrease the number of address arithmetic instructions and thus to generate compact and efficient code. The simple offset assignment (SOA) problem concerns the layout of variables for machines with one address register and the general offset assignment (GOA) deals with multiple address registers. Both these problems assume that each variable needs to be allocated for the entire duration of a program. Both SOA and GOA are NP-complete. In this paper, we present an effective heuristic for the general offset assignment problem with variable coalescing (CGOA) where two or more non-interfering variables can be mapped into the same memory location. Results on several benchmarks show the significant improvement of our solution compared to other heuristics. Results were further improved using a simulated annealing (SA).","PeriodicalId":165969,"journal":{"name":"2008 IEEE/ACM/IFIP Workshop on Embedded Systems for Real-Time Multimedia","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129789616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optimal address register allocation for arrays in DSP applications","authors":"Hassan A. Salamy, J. Ramanujam","doi":"10.1109/ESTMED.2008.4696998","DOIUrl":"https://doi.org/10.1109/ESTMED.2008.4696998","url":null,"abstract":"Optimizing the code size of a digital signal processing application is a crucial step in generating high quality and efficient code for embedded systems. Most modern digital signal processors (DSPs) provide multiple address registers and a dedicated address generation unit (AGU) that provides address generation in parallel to instruction execution. There is no address computation overhead if the next address is within the auto-modify range. Many DSP algorithms have an iterative pattern of references to array elements within loops. Thus, a careful assignment of array references to address registers reduces the number of explicit address register instructions as well as the execution cycles. In this paper, we present an optimal integer linear programming (ILP) formulation for the address register allocation problem (ARA) with code reconstructing techniques. Genetic algorithm is also used to solve the ARA problem to get a near-optimal solution in a reasonable amount of time for big embedded applications. Results on several benchmarks show the effectiveness of our techniques compared to other techniques in the literature.","PeriodicalId":165969,"journal":{"name":"2008 IEEE/ACM/IFIP Workshop on Embedded Systems for Real-Time Multimedia","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126790380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automatic H.264 encoder synthesis for the Cell processor from a target independent specification","authors":"Kyunghyun Kim, Jaewon Lee, Hae-woo Park, S. Ha","doi":"10.1109/ESTMED.2008.4697004","DOIUrl":"https://doi.org/10.1109/ESTMED.2008.4697004","url":null,"abstract":"A target independent specification model, called CIC (Common Intermediate Code) has been proposed to specify an application in a fashion that all potential functional and data parallelism are explicitly defined by the programmer. After mapping of an application to the target processors it is performed to exploit the parallelism optimally, the CIC translator synthesizes the target-specific code automatically. As a case study, we specify a base-line H.264 encoding algorithm, known as x264, with CIC, and synthesize a parallel program for the Cell processor. To exploit data parallelism of macro-block processing in the motion estimation module, we introduce a novel way of representing a wave-front parallelism and a new type of channel, called array channel, in the CIC model. Preliminary experiments confirm the viability of the proposed methodology of parallel programming for multiprocessor embedded systems.","PeriodicalId":165969,"journal":{"name":"2008 IEEE/ACM/IFIP Workshop on Embedded Systems for Real-Time Multimedia","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132919082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An architecture for dynamically reconfigurable real time audio processing systems","authors":"F. Bruschi, V. Rana, D. Sciuto","doi":"10.1109/ESTMED.2008.4697001","DOIUrl":"https://doi.org/10.1109/ESTMED.2008.4697001","url":null,"abstract":"In this paper we present an FPGA-based reconfigurable architecture for real time elaboration of audio streams. The architecure allows to dynamically define chains of cascading filters through which sound streams can be elaborated. Since the architecture requires only one-dimensional device reconfigurability, it can be implemented even on low cost devices, such as Xilinx Spartan3. Moreover, prior to reconfiguring a filter, sound is rerouted in order to avoid stream interruptions. This makes the architecture particularly suited for real time elaboration, for instance in the field of live music performances, for adaptive filtering or for professional digital recording systems. We show the implementation of the proposed architecture on a Xilinx Virtex4 [1], [2] based board, and present an example based on a set of filters that are loaded in series in the chain.","PeriodicalId":165969,"journal":{"name":"2008 IEEE/ACM/IFIP Workshop on Embedded Systems for Real-Time Multimedia","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126086474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
N. Silva, Arnaldo S. R. Oliveira, Rui Santos, L. Almeida
{"title":"The OReK real-time micro kernel for FPGA-based systems-on-chip","authors":"N. Silva, Arnaldo S. R. Oliveira, Rui Santos, L. Almeida","doi":"10.1109/ESTMED.2008.4697000","DOIUrl":"https://doi.org/10.1109/ESTMED.2008.4697000","url":null,"abstract":"This paper presents the features, architecture, application design flow and evaluation of the OReK real-time micro kernel, for integration on FPGA-based SoCs for applications running on embedded softcore or hardwired processors. The features of the OReK kernel include the combined support for heterogeneous task sets, predictable shared resources synchronization, task-based interrupt servicing and runtime policing of the timing constraints, making it appropriate to manage multimedia application tasks with different types of timing properties and requirements. OReK is compact, highly configurable and supported on three distinct FPGA embedded processor architectures with different levels of performance and implementation technologies.","PeriodicalId":165969,"journal":{"name":"2008 IEEE/ACM/IFIP Workshop on Embedded Systems for Real-Time Multimedia","volume":"138 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123255805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chi-Hua Lai, Kun-Yuan Hsieh, S. Lai, Jenq-Kuen Lee
{"title":"Parallelization of belief propagation method on embedded multicore processors for stereo vision","authors":"Chi-Hua Lai, Kun-Yuan Hsieh, S. Lai, Jenq-Kuen Lee","doi":"10.1109/ESTMED.2008.4696992","DOIUrl":"https://doi.org/10.1109/ESTMED.2008.4696992","url":null,"abstract":"Markov random field models provide a robust formulation of low-level vision problems. Among the problems, stereo vision remains the most investigated field. The belief propagation provides accurate result in stereo vision problems, however, the algorithm remains slow for practical use. In this paper we examine and extract the parallelisms in the belief propagation method for stereo vision on multicore processors. The results show that with parallelization exploration on multi-core processors, the belief propagation algorithm can have a 13.5 times speedup compared to the single processor implementation. The experimental results also indicate that the parallelized belief propagation algorithm on multicore processors is able to provide a frame rate in 6 frames per second.","PeriodicalId":165969,"journal":{"name":"2008 IEEE/ACM/IFIP Workshop on Embedded Systems for Real-Time Multimedia","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114616011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fast GPU-based space-time correlation for activity recognition in video sequences","authors":"Mahsan Rofouei, M. Moazeni, M. Sarrafzadeh","doi":"10.1109/ESTMED.2008.4696991","DOIUrl":"https://doi.org/10.1109/ESTMED.2008.4696991","url":null,"abstract":"Action recognition is becoming an important component of many computer vision applications such as video surveillance, video indexing and browsing. However most of the space time approaches to action recognition are very computationally expensive which prevents us from using them in real-time applications. This paper describes how Graphic Processing Units (GPUs) can be used in the field of action recognition to speed up this process. We implement a space-time behavior based correlation scheme on NVIDIA Quadro FX 5600 GPU and gain a 50x speedup over its counterpart CPU implementation.","PeriodicalId":165969,"journal":{"name":"2008 IEEE/ACM/IFIP Workshop on Embedded Systems for Real-Time Multimedia","volume":"649 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132126228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}