{"title":"Vlsi Systolic Array Implementation Of A Staged Decoder For Bcm Signals","authors":"G. Caire, J. Ventura-Traveset, J. Murphy, S. Kung","doi":"10.1109/VLSISP.1992.641046","DOIUrl":"https://doi.org/10.1109/VLSISP.1992.641046","url":null,"abstract":"","PeriodicalId":210565,"journal":{"name":"Workshop on VLSI Signal Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133189013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Systematic Folding Design Procedure For a1-D RLs Systolic Array","authors":"F. Lorenzelli, K. Yao","doi":"10.1109/VLSISP.1992.641078","DOIUrl":"https://doi.org/10.1109/VLSISP.1992.641078","url":null,"abstract":"Classical systolic design procedures rely on lineair or affine space-time transformations, because of the well understood properties of linear operations. In order to increase the efficiency of the Anal processor, various ad hoc manipulations applied to transformations which appeared to be nonlinear at the physical array level have been ,proposed. Folding is one of these possible transformations. In this paper, we show that folding can actually be considered as an overall linear procedure, by artificially increasing the dimensionality of the dependence graiph of the algorithm. A one-dimensional array for recursive least squares is also derived, as application of a systematic linear design procedure including folding. INTRODUCTION Recursive least squares (RLS) problems appear in a large number of signal processing applications. In this paper, a one-dimensional array for RLS problems is derived using a systematic systolic design. By means of an artificial increase of the dimensionality of the dependence graph of the <algorithm, folding is included in the design procedure, without losing the linearity of the methodology. An example of the use of RLS for an adaptive antenna application has been considered by Rader in [lo]. The signals arriving at n antenna ports are sampled and stored in a matrix X ( t ) E IR\"\". At any time instant, a new row of data is appended to X . The output y ( t ) is a weighted sum of the inputs, i . e . , y(t) = X ( t ) w . The problem consists of determining the weights w E IR\" and updating them at given times (say, eveiry 5n new sample times). The desired weight vector satisfies the property of","PeriodicalId":210565,"journal":{"name":"Workshop on VLSI Signal Processing","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125328055","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M.F.X.B. van Swaaij, F. Franssen, F. Catthoor, H. de Man
{"title":"Automating High Level Control F'low Transformations For Dsp Memory Management","authors":"M.F.X.B. van Swaaij, F. Franssen, F. Catthoor, H. de Man","doi":"10.1109/VLSISP.1992.641071","DOIUrl":"https://doi.org/10.1109/VLSISP.1992.641071","url":null,"abstract":"A method and a prototype tool for performing high level control flow transformations for DSP memory management are presented in this paper. Their efficacy is demonstrated by applying them to an industrial regularity detection algorithm, leading to an optimized storage scheme for the large multi-dimensional (M-D) signals within the algorithm. The data flow and control flow of the algorithm are captured by a Polyhedral Dependency Graph. This model has been designed to efficiently capture M-D data flows with a complexity that is independent from size parameters [10, 11]. Using this model, traditional loop-transformations can be generalized and captured in a control flow transformation technique that is amenable to analytical optimization. Performance figures of a CAD tool implementing the transformation method demonstrate the feasibility of this approach for the envisaged application domain.","PeriodicalId":210565,"journal":{"name":"Workshop on VLSI Signal Processing","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129182809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Data-driven Architecture For Rapid Prototyping Of High Throughput Dsp Algorithms","authors":"A. Yeung, J. Rabaey","doi":"10.1109/VLSISP.1992.641055","DOIUrl":"https://doi.org/10.1109/VLSISP.1992.641055","url":null,"abstract":"A data-driven multiprocessor architecture for rapid prototyping of complex DSP algorithms, based on direct execution of data-flow graphs, is presented. High computation bandwidth is achieved by exploiting fine-grain parallelism inherent in the target algorithms using simple processing elements interconnected by a flexible static communication network. The use of distributed control and data-driven principle of execution results in a highly scalable and modular architecture. A prototype chip, which is being designed, will contain 64 nanoprocessors and provide 32 GOPS running at 50 MHz. The benchmark results based on a variety of DSP algorithms in video processing, digital communication, digital filtering and speech recognition confirm the performance, efficiency and generality of the architecture.","PeriodicalId":210565,"journal":{"name":"Workshop on VLSI Signal Processing","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121430658","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"VLSI Implementation Of The Realtime Image Processing Parallel Architecture Gflops","authors":"D. Houzet","doi":"10.1109/VLSISP.1992.641049","DOIUrl":"https://doi.org/10.1109/VLSISP.1992.641049","url":null,"abstract":"This paper presents the implementation of the processor of the Image Processing parallel architecture GFLOPS. This processor is a RISC/VLIW. The network module associated in the chip is such that it is possible to build a large architecture by the juxtaposition of as many chips as required. An evaluation of this architecture is presented at the end of this paper through the use of simulation results.","PeriodicalId":210565,"journal":{"name":"Workshop on VLSI Signal Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124955321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Morphological Image Processing On Vlsi Opto-electronic Array Processors","authors":"W. Fang, T. Shaw, J. Yu","doi":"10.1109/VLSISP.1992.641060","DOIUrl":"https://doi.org/10.1109/VLSISP.1992.641060","url":null,"abstract":"A full-custom opto-electronic VLSI design for high-speed morphological image processing has been developed by combining a 2dimensional fine-grain parallel array architecture with on-chip focalplane photodetectors and transmitters. The processor array performs morphological functions on the opto-detected binary image with a programmable structuring element of any size. A specific language called MIPL is defined for morphological image processing and fully supported by the MIP hardware. Sophisticated morphological image processing algorithms were implemented by executing specific parallel programs (written in MIPL) on the MIP. An 8x8 array processor prototype chip has been designed in 1.2 mm x 1.2 mm silicon area using the MOSIS 2-pm CMOS process.","PeriodicalId":210565,"journal":{"name":"Workshop on VLSI Signal Processing","volume":"263 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116063534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Vlsi Architectures And Implementation Of Predictive Tree-searched Vector Quantizes For Real-time Video Compression","authors":"S. Yu, R. Kolagotla, J. JáJá","doi":"10.1109/VLSISP.1992.641057","DOIUrl":"https://doi.org/10.1109/VLSISP.1992.641057","url":null,"abstract":"We describe a pipelined systolic architecture for implementing Predictive Tree-Searched Vector Quantization (PTSVQ) for real-time image and speech coding applications. This architecture uses identical processors for both the encoding and decoding processes. The overall design is regular and the control is simple. Input data is processed at a rate of 1 pixel per clock cycle, which allows real-time processing of images a t video rates. We implemented these processors using 1.2pm CMOS technology. Spice simulations indicate correct operation a t 40 MHz. Prototype version of these chips fabricated using 2pm CMOS technology work a t 20 MHz.","PeriodicalId":210565,"journal":{"name":"Workshop on VLSI Signal Processing","volume":"538 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132282698","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Piecewise Linear Schedules For Recurrence Equations","authors":"S. Rajopadhye, L. Mui, S. Kiaei","doi":"10.1109/VLSISP.1992.641069","DOIUrl":"https://doi.org/10.1109/VLSISP.1992.641069","url":null,"abstract":"The scheduling problem for a system of &ne recurrence equaitions (SARE) has been studied by many researchers. The emphasis has been on an important class of timing functions called linear or afine schedules. For many SAREs, linear schedules may not exist, although the SARE is computable. It will be shown that it is possible to find joiecewise linear schedules (PLS) for many practical algorithms expressed in terms of SAREs. PLS have different slopes for different variables in the algorithm. For each variable, the computation domain is partitioned into finitely many “pieces” in which the schedule is different for each subdomain. The main focus of this paper is to introduce PLS and develop a synthesis procedure to find PLS for the given SARE.","PeriodicalId":210565,"journal":{"name":"Workshop on VLSI Signal Processing","volume":"26 3‐4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132340948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Concurrent Losslesss Coder For Video Compression","authors":"K. Parhi, G. Shrimali","doi":"10.1109/VLSISP.1992.641058","DOIUrl":"https://doi.org/10.1109/VLSISP.1992.641058","url":null,"abstract":"Huffman coding technique leads to loss-less coding and is used extensively in practical video compression systems. This technique assigns variable word length codes to different symbols based on the probabilities of the symbols. The unequal code word length of tlie symbols makes it difficult to implement high-speed Huffman decoders using pipelining and parallel processing. Previous approaches to pipeline the Huffman decoder algorithm lead to large increase in hardware complexity. In cost-effective decoders, we need to implement high-speed decoders using little or no hardware overhead. To this end, this paper proposes a new concurrent loss-less coder which requires no hardware overhead. Our loss-less coder imposes the code word length multiplicity constraint, i. e. , it forces the code word lengths to be multiples of the speedup factor. This loss-less coder leads to slight degradation in coding performance (as measured in terms of the average code word length). We improve the performance of the concurrent coder by using conditional coding. These combined approaches lead to coding performance comparable with unconditional Huffman decoder and leads to about 10 times reduction in hardware compared with previous approaches (for the same speed-up).","PeriodicalId":210565,"journal":{"name":"Workshop on VLSI Signal Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115421264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}