{"title":"On implementation and design of filter banks for subband adaptive systems","authors":"Stephan Weiss, M. Harteneck, R. Stewart","doi":"10.1109/SIPS.1998.715780","DOIUrl":"https://doi.org/10.1109/SIPS.1998.715780","url":null,"abstract":"We introduce a polyphase implementation and design of an oversampled K-channel generalized DFT (GDFT) filter bank, which can be employed for subband adaptive filtering, and therefore is required to have a low aliasing level in the subband signals. A polyphase structure is derived which can be factorized into a real-valued polyphase network and a GDFT modulation. For the latter, an FFT realization may be used, yielding a very inexpensive polyphase implementation for arbitrary integer decimation ratios N/spl les/K. We also present an analysis underlining the efficiency of complex-valued subband processing. The design of the filter bank is completely based on the prototype filter and solved using a fast converging iterative least squares method, for which we give examples, The design specifications closely correspond with performance limits of subband adaptive filtering, which are underpinned by simulation results.","PeriodicalId":151031,"journal":{"name":"1998 IEEE Workshop on Signal Processing Systems. SIPS 98. Design and Implementation (Cat. No.98TH8374)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115556517","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"VLSI design and implementation of low-complexity adaptive turbo-code encoder and decoder for wireless mobile communication applications","authors":"S. Hong, J. Yi, W. Stark","doi":"10.1109/SIPS.1998.715786","DOIUrl":"https://doi.org/10.1109/SIPS.1998.715786","url":null,"abstract":"A low-complexity multi-stage pipeline turbo-code encoder and decoder architecture for wireless mobile communication applications is presented. The VLSI decoder architecture presented in this paper avoids complex operations such as exponent and logarithmic computations. The algorithm simplification results in a very efficient low-complexity suboptimal digital implementation. Furthermore, the communication channel statistical estimation process which involves a large number of complex operations is greatly simplified with minor performance degradation. The architecture incorporates simple decision logic that checks for the iteration termination condition. The number of iterations is made to be adaptive and the power-down mode is incorporated. The entire encoder/decoder is implemented with 0.6-/spl mu/m CMOS technology using the EPOCH computer aided design tool.","PeriodicalId":151031,"journal":{"name":"1998 IEEE Workshop on Signal Processing Systems. SIPS 98. Design and Implementation (Cat. No.98TH8374)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123942480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Study of cache system in video signal processors","authors":"Z. Wu, W. Wolf","doi":"10.1109/SIPS.1998.715765","DOIUrl":"https://doi.org/10.1109/SIPS.1998.715765","url":null,"abstract":"Memory system design is especially important for video signal processing, where the video signal processor (VSP) not only requires a lot of data, but also needs a very high bandwidth and low latency. While caches become ubiquitous in modern systems, their performance still falls behind that of the processors. Therefore a number of modifications to traditional caches have emerged: victim cache, stream buffer, data prefetching techniques, etc. However, few people have studied cache memory for VSP. We present a case study based on extensive trace-driven scheduling, which shows that while stream buffer and stride prediction table are very effective for streaming video data, they should be applied in a different way in dedicated VSP with higher degrees of parallelism than in current super-scalar workstation architectures.","PeriodicalId":151031,"journal":{"name":"1998 IEEE Workshop on Signal Processing Systems. SIPS 98. Design and Implementation (Cat. No.98TH8374)","volume":"107 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122641268","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optimized software synthesis for digital signal processing algorithms: an evolutionary approach","authors":"Jürgen Teich, E. Zitzler, S. Bhattacharyya","doi":"10.1109/SIPS.1998.715822","DOIUrl":"https://doi.org/10.1109/SIPS.1998.715822","url":null,"abstract":"Based on the model of synchronous data flow (SDF), so-called single appearance schedules are known to provide memory-optimal schedules. Among these, the problem of buffer memory optimization is treated: (1) an evolutionary algorithm (EA) is applied to efficiently explore the (in general) exponential search space of actor firing orders; (2) for each order, the buffer costs are evaluated by applying a dynamic programming post-optimization step (GDPPO).","PeriodicalId":151031,"journal":{"name":"1998 IEEE Workshop on Signal Processing Systems. SIPS 98. Design and Implementation (Cat. No.98TH8374)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116631776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A system-on-chip design of a low-power smart vision system","authors":"W. Fang","doi":"10.1109/SIPS.1998.715769","DOIUrl":"https://doi.org/10.1109/SIPS.1998.715769","url":null,"abstract":"A low-power smart imager design is proposed for real-time machine vision applications. It takes advantages of recent advances in integrated sensing/processing designs, electronic neural networks, and sub-micron VLSI technology. The smart vision system integrates an active pixel camera, with a programmable neural computer and an advanced microcomputer. A system-on-a-chip implementation of this smart vision system is shown to be feasible by integrating the whole system into a 3-cm/spl times/3-cm chip design in a 0.18 m CMOS technology. The on-chip neural computer provides one tera-operation-per-second computing power for various parallel vision operations and smart sensor functions. Its high performance is due to massively parallel computing structures, high data throughput rates, fast learning capabilities, and system-on-a-chip implementation. This highly integrated smart imager can be used for various scientific missions and other military, industrial or commercial vision applications.","PeriodicalId":151031,"journal":{"name":"1998 IEEE Workshop on Signal Processing Systems. SIPS 98. Design and Implementation (Cat. No.98TH8374)","volume":"25 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132062687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Implementation of DCT-domain motion estimation and compensation","authors":"R. Kleihorst, F. Cabrera","doi":"10.1109/SIPS.1998.715768","DOIUrl":"https://doi.org/10.1109/SIPS.1998.715768","url":null,"abstract":"Hybrid video compression schemes store an entire image for predictive coding. Traditionally, this image is stored in the time domain needing almost 5 Mbit of memory for main-level image format. The amount of storage space can be reduced if the data is stored in the runlength encoding-discrete cosine transform (RLE-DCT) domain, even using the available compression and buffer-control algorithm to guarantee a storage amount. We show that a hardware implementation is feasible and worthwhile in relation to traditional encoders. Motion estimation is performed by recursive block-matching, at sub-pixel accuracy.","PeriodicalId":151031,"journal":{"name":"1998 IEEE Workshop on Signal Processing Systems. SIPS 98. Design and Implementation (Cat. No.98TH8374)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125302086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Pentium-MMX-based implementation of a digital copier","authors":"Jae-Woo Ahn, Wonyong Sung","doi":"10.1109/SIPS.1998.715777","DOIUrl":"https://doi.org/10.1109/SIPS.1998.715777","url":null,"abstract":"We develop real-time image processing programs for a digital copier using a general-purpose microprocessor. To exploit the inherent data parallelism in many image processing algorithms, we use Intel's Pentium processor with multimedia extension (MMX). Each step of the digital copier process including the X-Zoom and the error diffusion halftoning is aggressively optimized for the Pentium MMX processor. The X-Zoom process that is based on the linear interpolation method is optimized using the software pipelining technique. For the error diffusion halftoning which requires nonlinear feedback, we exploit both the control-level and data-level parallelism. For the latter approach, a speculative quantization method is developed to break the dependency relation due to feedback and quantization operations. Our implementation acquires the maximum throughput of 30 ppm for A4-size paper using one 166 MHz Pentium MMX CPU, which is approximately five times faster than the code without MMX optimization.","PeriodicalId":151031,"journal":{"name":"1998 IEEE Workshop on Signal Processing Systems. SIPS 98. Design and Implementation (Cat. No.98TH8374)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128564995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Reconfigurable signal processor for channel coding and decoding in low SNR wireless communications","authors":"S. Halter, M. Oberg, P. Chau, P. Siegel","doi":"10.1109/SIPS.1998.715789","DOIUrl":"https://doi.org/10.1109/SIPS.1998.715789","url":null,"abstract":"An area and computational-time efficient turbo decoder implementation on a reconfigurable processor is presented. The turbo decoder takes advantage of the latest sliding window algorithms to produce a design with minimal storage requirements as well as offering the ability to configure key system parameters via software. The parameter programmability allows the decoder to be used in a research environment to study less understood aspects of turbo codes.","PeriodicalId":151031,"journal":{"name":"1998 IEEE Workshop on Signal Processing Systems. SIPS 98. Design and Implementation (Cat. No.98TH8374)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121732834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Field programmable DSP transform arrays","authors":"N. Venkateswaran, A. Murugavel, G. Chandramouli","doi":"10.1109/SIPS.1998.715778","DOIUrl":"https://doi.org/10.1109/SIPS.1998.715778","url":null,"abstract":"Several dedicated VLSI architectures have been proposed in the literature for performing most of the linear transforms like DCT, DST, FFT and AFT, having orthogonal basis functions, though not much on the the non-orthogonal transforms like the Gabor. We present a hardware-reconfigurable architecture with which almost all these linear, nonlinear, orthogonal and non-orthogonal transforms can be performed. Also the reconfigurability helps perform inverse transforms. The architecture has DCT, DST, and arithmetic blocks respectively and a switching network. This switching network is hardware-reconfigurable to make interconnections between the DCT, the DST and the arithmetic blocks. With the help of the switching network, the multiplier units, adder/subtractor units present in the arithmetic blocks can be reconfigured to perform inner product (IP) operations. It is well-known that IP plays a major role in DSP. The structure of the architecture is such that, the user (in the field) can easily program the different hardware reconfigurations for performing either single, multiple or multi-dimensional forward and reverse transforms. This reconfigurable architecture is suitable for a MCM type of implementation.","PeriodicalId":151031,"journal":{"name":"1998 IEEE Workshop on Signal Processing Systems. SIPS 98. Design and Implementation (Cat. No.98TH8374)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133247652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Loop Scheduling Algorithm For Timing And Memory Operation Minimization With Register Constraint","authors":"F. Chen, S. Tongsima, E. Sha","doi":"10.1109/SIPS.1998.715821","DOIUrl":"https://doi.org/10.1109/SIPS.1998.715821","url":null,"abstract":"In this paper, we present a novel scheduling framework, called Memory Operation minimization Rotation Scheduling (MORS), for scheduling multi-dimensional applications subject to register constraint and other resource constraints. Under such constraints, MORS strives to shorten the schedule length while minimally inserting the load and store operations in the schedule to reduce the register requirement pressure. Experiments show that our approach is useful for reducing the schedule length without violating the register constraint of a target machine. Furthermore, the average reduction in the schedule length produced by these experiments reaches 36.6%.","PeriodicalId":151031,"journal":{"name":"1998 IEEE Workshop on Signal Processing Systems. SIPS 98. Design and Implementation (Cat. No.98TH8374)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124981896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}