{"title":"Novel VLSI multi-bit coded multiplier and multiplier-accumulator architectures for DSP applications","authors":"D. Poornaiah, P. A. Ananda Mohan","doi":"10.1109/VLSISP.1995.527525","DOIUrl":"https://doi.org/10.1109/VLSISP.1995.527525","url":null,"abstract":"In this paper we propose two new algorithms for (i) concurrent computation of odd digit partial products (PPs) and the inner-product-step and (ii) minimization of sign extension bits and map them onto a novel concurrent VLSI architecture based on carry-save 4:2/7:3 compressors for designing efficient multi-bit coded multipliers and multiplier-accumulator (MAC) cells. The use of the proposed architecture results in the total elimination of the separate adder modules normally required for performing the odd-digit PP computation and the inner-product step. Besides, there is a reduction in the input data path complexity of the multiplexers from O(2/sup k-1/) in the conventional schemes to O(k). As a result, approximate reductions ranging from 15% to 40% in the computation time and area are achieved along with reduced number of interconnections making the proposed schemes highly attractive for VLSI implementation for performing multi-bit recoding even for k>6, k being the recoding size. This important feature makes the proposed architecture attractive also to be used in low-power and pipelined DSP applications.","PeriodicalId":286121,"journal":{"name":"VLSI Signal Processing, VIII","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128243392","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fast VLSI overlapped transform kernel","authors":"E. Deprettere, G. Hekstra, R. Heusdens","doi":"10.1109/VLSISP.1995.527500","DOIUrl":"https://doi.org/10.1109/VLSISP.1995.527500","url":null,"abstract":"Transforms of image sequences are commonly compositions of discrete cosine/sine transforms. The advantage is that the N-point DCT/DST requires only Nlog(N) multiplications when properly decomposed in the well known butterfly structure. The butterfly decomposition removes redundancy in the transform. This provides speed-up but not so much cost reduction because numerical sensitivity sets a price on the implementation. An alternative way is to guarantee robustness, by relying on orthogonal arithmetic, and exploiting this robustness to make computations inexpensive and, therefore, transformations fast. This concept and its merits are the subjects of this paper. An example from image coding is given.","PeriodicalId":286121,"journal":{"name":"VLSI Signal Processing, VIII","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131367307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Normalised Givens rotations for recursive least squares processing","authors":"J. McWhirter, R. Walke, J. Kadlec","doi":"10.1109/VLSISP.1995.527503","DOIUrl":"https://doi.org/10.1109/VLSISP.1995.527503","url":null,"abstract":"An algorithm for recursive least squares optimisation based on the method of QR decomposition by Givens rotations is reformulated in terms of parameters whose magnitude is never greater than one. In view of the direct analogy to statistical normalisation, it is referred to as the normalised Givens rotation algorithm. An important consequence of the normalisation is that most of the resulting least squares computation may be carried out using fixed point arithmetic. This should enable the design of a much simpler application specific integrated circuit to implement the Givens rotation processor for adaptive filtering and beamforming.","PeriodicalId":286121,"journal":{"name":"VLSI Signal Processing, VIII","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132604087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A low-power video decoder with power, memory, bandwidth and quality scalability","authors":"N. Chaddha, T. Meng","doi":"10.1109/VLSISP.1995.527516","DOIUrl":"https://doi.org/10.1109/VLSISP.1995.527516","url":null,"abstract":"This paper describes a low-power scalable video decoder for use in portable video applications. The scalable video decoder uses tree structured vector quantization (TSVQ) of perceptually weighted block transforms. The subjective quality of compressed images improves significantly by the use of perceptual distortion measures. The low-complexity, low-power architecture requires only table-lookups to perform video decompression. Inverse transforms are performed as pre-processing steps in the tables. Color conversion from YUV to RGB and color quantization are also performed as pre-processing steps in the tables. The video decoder provides a trade-off between rate-distortion, power and memory size. This allows to trade-off power and memory size for better quality of compressed video and vice-versa. The power consumption of our video decoder is orders of magnitude smaller than other decoders in existing technology. Measured performance shows that the scalable video decoder consumes between 50 to 150 micro-watt with a 1.5 V power supply in 0.8 /spl mu/ CMOS technology for 160/spl times/240 resolution video at 30 frames per second.","PeriodicalId":286121,"journal":{"name":"VLSI Signal Processing, VIII","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131254831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Embedded memory module design for video signal processing","authors":"Tian-Sheuan Chang, C. Jen","doi":"10.1109/VLSISP.1995.527521","DOIUrl":"https://doi.org/10.1109/VLSISP.1995.527521","url":null,"abstract":"Two embedded memory designs are proposed and implemented for video signal processing. Complying with the features of video signal processing, concurrent line access emulates the multiport capability with single port cell hardware and little access time overhead. Layout area is 56% of two port implementation for size 2 Kb. Block access mode provides fast addressing (26% faster than conventional scheme for size 256 w/spl times/32 b). Although these two fast modes exhibit some restriction of prefer-access-order, it is no loss of generality because video signal processing algorithms possess high data parallelism and less dependency.","PeriodicalId":286121,"journal":{"name":"VLSI Signal Processing, VIII","volume":"96 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115661772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A programming environment for HD-Picot system","authors":"Y. Itoh, N. Yagi, K. Fukui, K. Enami, N. Sasaki","doi":"10.1109/VLSISP.1995.527482","DOIUrl":"https://doi.org/10.1109/VLSISP.1995.527482","url":null,"abstract":"A programming environment for the multi-processor system-HD-Picot System-has been developed. HD-Picot System is a programmable real-time video signal processing system. It has an architecture suitable for video-rate processing of signals, including both conventional TV and high definition TV. The programming environment transforms a video processing application written in graphical form into code sets executable on HD-Picot System. It consists mainly of a graphical editor and an integrated compiler which breaks up a signal flow graph, compensates timing differences between signal paths, generates programs, and displays control panel on a screen of a workstation.","PeriodicalId":286121,"journal":{"name":"VLSI Signal Processing, VIII","volume":"7 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126143603","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Architectural synthesis of an image processing algorithm using IRIS","authors":"D. Trainor, Roger Francis Woods, J. McCanny","doi":"10.1109/VLSISP.1995.527488","DOIUrl":"https://doi.org/10.1109/VLSISP.1995.527488","url":null,"abstract":"Details are presented of the IRIS synthesis system for high-performance digital signal processing. This tool allows non-specialists to automatically derive VLSI circuit architectures from high-level, algorithmic representations, and provides a quick route to silicon implementation. The applicability of the system is demonstrated using the design example of a one-dimensional Discrete Cosine Transform circuit.","PeriodicalId":286121,"journal":{"name":"VLSI Signal Processing, VIII","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131306827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improved minimum-order augmented pipelining algorithm","authors":"K. Chang, XuDuan Lin","doi":"10.1109/VLSISP.1995.527520","DOIUrl":"https://doi.org/10.1109/VLSISP.1995.527520","url":null,"abstract":"Firstly, a comparison of overall performance between minimum-order augmented pipelining (MAP) clustered look-ahead (CLA) filters and scattered look-ahead (SLA) filters is extensively achieved. From the results, the MAP algorithm is revealed to have a problematic domain searching procedure for finding optimized pipelining coefficients. To solve the problems and improve the numerical performance, we propose an improved MAP (IMAP) algorithm, which is especially beneficial for high-Q ultra-high-speed digital filters. The IMAP algorithm is optimized in the aspect of the minimization of augmented pipelining order and undesirable quantization effects simultaneously. Performance of IMAP is compared with that of the conventional MAP algorithm, to demonstrate the advantage of the IMAP algorithm.","PeriodicalId":286121,"journal":{"name":"VLSI Signal Processing, VIII","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130031318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A scalable systolic array architecture for 2D discrete wavelet transforms","authors":"J. Chen, M. Bayoumi","doi":"10.1109/VLSISP.1995.527501","DOIUrl":"https://doi.org/10.1109/VLSISP.1995.527501","url":null,"abstract":"A systematic synthesis approach has been developed for scalable systolic array architecture for a 2D discrete wavelet transform (DWT) based on the data dependence analysis and linear index space transformation. The proposed architecture has regular topology, local routing, simple controller and high throughput rate. It can be easily extended to different parameters of various levels, macroblocks and filters. The derived architecture has been prototyped using Cadence Edge Framework.","PeriodicalId":286121,"journal":{"name":"VLSI Signal Processing, VIII","volume":"266 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131921002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Scalable implementation scheme for multirate FIR filters and its application in efficient design of subband filter banks","authors":"Po-Cheng Wu, Liang-Gee Chen, T. Chiueh","doi":"10.1109/VLSISP.1995.527505","DOIUrl":"https://doi.org/10.1109/VLSISP.1995.527505","url":null,"abstract":"A scalable implementation scheme for multirate FIR filters in consideration of both the processing time and the silicon area is presented in this paper. According to our various requirements, the flexible and efficient implementation scheme can simultaneously reduce both the time cost T to 1/kT and the area cost A to k/MA (M is the decimation or interpolation rate, k is any factor of M). Furthermore, by employing the scalable implementation scheme, we also propose an efficient design technique for subband filter banks.","PeriodicalId":286121,"journal":{"name":"VLSI Signal Processing, VIII","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122030803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}