M. D. Galanis, G. Dimitroulakos, A. Kakarountas, C. Goutis
{"title":"Speedups from partitioning software kernels to FPGA hardware in embedded SoCs","authors":"M. D. Galanis, G. Dimitroulakos, A. Kakarountas, C. Goutis","doi":"10.1109/SIPS.2005.1579917","DOIUrl":"https://doi.org/10.1109/SIPS.2005.1579917","url":null,"abstract":"This paper presents a hardware/software partitioning methodology for improving performance in single-chip systems comprised by processor and reconfigurable logic. The reconfigurable logic is realized by field programmable gate array technology. Critical software parts are selected for acceleration on the reconfigurable logic. A generic hybrid system-on-chip platform, which can model the majority of existing processor-FPGA systems, is considered by the method. The partitioning method uses an automated kernel identification process at the basic-block level for detecting critical software portions. Three different instances of the generic platform and two sets of benchmarks are used in the experiments. The analysis on five real-life applications showed that these applications spend an average of 69% of their instruction count in 11% on average of their code. The extensive experimentation illustrates that for the systems composed by 32-bit processors the speedup of five applications ranges from 1.3 to 3.7 relative to an all software solution. For a platform composed by an 8-bit processor, the performance gains of eight DSP algorithms are considerably greater, since the average speedup equals 28.","PeriodicalId":436123,"journal":{"name":"IEEE Workshop on Signal Processing Systems Design and Implementation, 2005.","volume":"163 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115997218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the way to an H.264 HW/SW reference model: a SystemC modeling strategy to integrate selected IP-blocks with the H.264 software reference model","authors":"I. Amer, M. Sayed, Wael Badawy, G. Jullien","doi":"10.1109/SIPS.2005.1579860","DOIUrl":"https://doi.org/10.1109/SIPS.2005.1579860","url":null,"abstract":"SystemC is a new hardware design concept that enables the designer to perform early functional verification of developed hardware blocks by facilitating their integration with software in a unified platform. It provides hardware-oriented constructs within the context of C++ as a class library implemented in standard C++. In this paper, we propose a strategy that enables us to emulate a model of a full HW/SW H.264 encoder. The latest reference software is modified by allowing selected computationally extensive modules to be optionally executed in emulated hardware. SystemC is used for hardware modeling. The proposed strategy enables us to perform early functional verification and conformance analysis of the IP-blocks at the system level of abstraction.","PeriodicalId":436123,"journal":{"name":"IEEE Workshop on Signal Processing Systems Design and Implementation, 2005.","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115114767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Navier-Stokes processor for biomedical applications","authors":"V. Zygouris, K. Karagianni, T. Stouraitis","doi":"10.1109/SIPS.2005.1579895","DOIUrl":"https://doi.org/10.1109/SIPS.2005.1579895","url":null,"abstract":"VLSI implementation issues in the design of a parallel processor for the solution of a set of Navier-Stokes (NS) equations which model the flow of blood through a stenosis are discussed in this paper. Specifically, the Navier-Stokes equations and the Poisson equation are used for the calculation of the velocities and pressure of the blood in the stenosis. Selection of the stenosis model, definition of the computation grid, selection of the initial conditions and boundary conditions, discretization of the original equations, software simulations with the SIMPLER method are discussed. The impact of these choices on VLSI architecture complexity is investigated.","PeriodicalId":436123,"journal":{"name":"IEEE Workshop on Signal Processing Systems Design and Implementation, 2005.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125294213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Image quality assessment metrics based on multi-scale edge presentation","authors":"Guangtao Zhai, Wenjun Zhang, Xiaokang Yang, Yi Xu","doi":"10.1109/SIPS.2005.1579888","DOIUrl":"https://doi.org/10.1109/SIPS.2005.1579888","url":null,"abstract":"We propose two image quality assessment metrics named multi-scale modular similarity (M/sup 2/S) and multi-scale modular maxima similarity (M/sup 3/S). It has been well known 1) multi-scale analysis is an effective decomposition technique in image processing, and 2) contours and edges analyses are crucial in the understanding of natural scenes. Motivated by these two facts, we attempt to develop quality assessment metrics using multi-scale edges presentation. We decompose an image with un-decimated dyadic wavelet transform, and then develop M/sup 2/S metric to evaluate the quality of images by comparing the modulus across scales of wavelet transform. Multi-scale edges are defined as local maxima of modulus, which often contain the most important information of the image. As a further step of M/sup 2/S metric, M/sup 3/S only uses the multi-scale edge information. M/sup 3/S is therefore essentially a reduced-reference image quality metric. Extensive experiments indicate that in most cases, the prediction abilities of these two proposed metrics are similarly excellent and both outperform the widely used PSNR and the simple structural similarity metrics.","PeriodicalId":436123,"journal":{"name":"IEEE Workshop on Signal Processing Systems Design and Implementation, 2005.","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128579943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"From NoC security analysis to design solutions","authors":"S. Evain, J. Diguet","doi":"10.1109/SIPS.2005.1579858","DOIUrl":"https://doi.org/10.1109/SIPS.2005.1579858","url":null,"abstract":"This paper addresses a new kind of security vulnerable spots introduced by network-on-chip (NoC) use in system-on-chip (SoC) design. This study is based on the experience of a CAD framework for NoC design and proposes a classification of weaknesses with regard to usual routing and interface techniques. Finally design strategies are proposed and a new path routing technique (SCP) is introduced with the aim to enforce security.","PeriodicalId":436123,"journal":{"name":"IEEE Workshop on Signal Processing Systems Design and Implementation, 2005.","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126185027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Low power techniques for MP3 audio decoder using subband cut-off approach","authors":"T. Tsai, Chun-Kai Wang, Chun-Nan Liu","doi":"10.1109/SIPS.2005.1579877","DOIUrl":"https://doi.org/10.1109/SIPS.2005.1579877","url":null,"abstract":"In this paper low power techniques for MP3 audio decoder are used based on a phenomenon of zero-filled subbands. By our analysis, large part of subbands contains useless zero-valued data, which can be cut-off. We propose an effective architecture to gain incremental improvements in power consumption and computation complexity. In DWIMDCT block, the computational complexity can be reduced to 34% when we combine the subband cut-off technique and our proposed fast algorithm. In SIMDCT block of synthesis filterbank, the same technique can be applied and the computational complexity can be reduced to 17%. This subband cut-off approach is simple and can be easily integrated with other designs.","PeriodicalId":436123,"journal":{"name":"IEEE Workshop on Signal Processing Systems Design and Implementation, 2005.","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125611791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Diagonal low-density parity-check code for simplified routing in decoder","authors":"E. Kim, G. Choi","doi":"10.1109/SIPS.2005.1579966","DOIUrl":"https://doi.org/10.1109/SIPS.2005.1579966","url":null,"abstract":"We propose a novel low-density parity-check (LDPC) decoder design methodology by introducing a special code named diagonal-LDPC (DLDPC) code. An LPDC code, defined by a parity check matrix H, can be represented by a bipartite graph. To address the complex routing problem in the LDPC decoder implementation, a partitioned bipartite-graph code is proposed and generalized to a DLDPC code having constraint of positioning 1's near the diagonal area. This structured code simplifies the routing problem [Y. Kou et al, 2001] [R.M. Tanner et al, 2001] [H. Zhang et al, 2003] and enables cell-based highly regular fully-parallel decoder design without compromising the code performance.","PeriodicalId":436123,"journal":{"name":"IEEE Workshop on Signal Processing Systems Design and Implementation, 2005.","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133412250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SIMD implementation of interpolation in algebraic soft-decision Reed-Solomon decoding","authors":"L. Boulianne, W. Gross","doi":"10.1109/SIPS.2005.1579965","DOIUrl":"https://doi.org/10.1109/SIPS.2005.1579965","url":null,"abstract":"The Koetter-Vardy algorithm is an algebraic soft-decision decoding algorithm for Reed-Solomon codes. Software implementations of the Koetter-Vardy algorithm are considered as part of a redecoding architecture that augments a hardware hard-decision decoder with soft-decision decoding software on an embedded processor. In this paper we investigate the implementation of the interpolation step of the Koetter-Vardy algorithm on SIMD processor architectures. A parallelization of the algorithm is given using the K'th order Horner's rule for parallel polynomial evaluation. The SIMD algorithm has a running time 2.5 to 4 times faster than a serial implementation on a DSP processor. To gain further speedup we propose a merged-SIMD architecture that calculates the Hasse derivative in parallel with the polynomial updates.","PeriodicalId":436123,"journal":{"name":"IEEE Workshop on Signal Processing Systems Design and Implementation, 2005.","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130483161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient motion vector refinement architecture for sub-pixel motion estimation systems","authors":"T. Dias, N. Roma, L. Sousa","doi":"10.1109/SIPS.2005.1579885","DOIUrl":"https://doi.org/10.1109/SIPS.2005.1579885","url":null,"abstract":"This paper proposes a new, scalable and efficient VLSI architecture for real-time sub-pixel motion estimation. The proposed structure is optimized for search strategies using small search ranges, such as hierarchical or sub-pel refinement algorithms. Based on the proposed architecture, a highly modular and configurable motion estimation co-processor capable of estimating optimal motion vectors with any given accuracy and using any known interpolation algorithm is presented. The performance of this processing structure was evaluated by embedding it in a two-level motion estimation system with minimum memory bandwidth requirements, that estimates half-pixel accurate motion vectors using a two-step search procedure. Experimental results for implementations on ASIC and FPGA devices show that by using the proposed architecture it is possible to estimate motion vectors up to the 4CIF image format, in real-time with any given sub-pixel accuracy.","PeriodicalId":436123,"journal":{"name":"IEEE Workshop on Signal Processing Systems Design and Implementation, 2005.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116268577","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Parallel implementations with low-complexity of rotation-based adaptive filters","authors":"M. Bhouri","doi":"10.1109/SIPS.2005.1579936","DOIUrl":"https://doi.org/10.1109/SIPS.2005.1579936","url":null,"abstract":"This paper propose a new parallel implementation of some rotation-based adaptive filters (Bhouri, M, 2000). These filters are characterized by a robust behavior to input signal correlation (Bhouri, M, et al, 1998) and good numerical properties. However, their implementations have reduced complexities. The circuits based on these block-diagonal adaptive algorithms use less computing cells than the systolic circuit of the QR-RLS algorithm. Nevertheless, these new and low-complexity architectures have no longer a pipeline structure.","PeriodicalId":436123,"journal":{"name":"IEEE Workshop on Signal Processing Systems Design and Implementation, 2005.","volume":"24 5","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121015498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}