{"title":"Synthesis of VLSI architectures for two-dimensional discrete wavelet transforms","authors":"Jongwoo Bae, V. Prasanna","doi":"10.1109/ASAP.1995.522921","DOIUrl":"https://doi.org/10.1109/ASAP.1995.522921","url":null,"abstract":"We propose VLSI architectures with parallel I/O capability to compute the Two-Dimensional Discrete Wavelet Transform. Our design can handle large images arriving at high frame rates. A video codec based on our architecture can support multiple channels in parallel and can provide the needed performance for network based video applications. Our architecture with parallel I/O offers a solution for the low power needs of mobile/visual communication systems. Our architecture employs block-based I/O and a dual memory buffer to store intermediate results to schedule the filter operations. This leads to a high throughput rate of n pixels per clock cycle and a small memory size of j(l-1)/(N+n)+2n/sup 2/, for an N/spl times/N input image, where n/spl times/n is the block size, l is the alter length, and j is the number of octaves. The resulting architecture has a latency of 2l+n for each octave and a total execution time of N/sup 2//n+2l+n+3jn.","PeriodicalId":354358,"journal":{"name":"Proceedings The International Conference on Application Specific Array Processors","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122861174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Synthesis of multirate VLSI arrays","authors":"P. Lenders, S. Rajopadhye","doi":"10.1109/ASAP.1995.523640","DOIUrl":"https://doi.org/10.1109/ASAP.1995.523640","url":null,"abstract":"Many applications in signal and image processing can be implemented on regular VLSI architectures such as systolic arrays. Multirate arrays, or MRAs are an extension of systolic arrays where different data streams propagate with different clocks. It is known that they can be modelled as systems of uniform recurrence equations over sparse polyhedral domains. Using well known linear index transformation rules for systems of affine recurrence equations, or SAREs, we show that MRAs constitute a particular proper subset of SAREs. We describe how an MRA can be systematically derived from an initial specification in the form of a mathematical equation. The main transformation that we use is dependency decomposition, and rue illustrate our method by deriving a hitherto unknown decimation filter array that improves upon the hardware cost of previously published filters.","PeriodicalId":354358,"journal":{"name":"Proceedings The International Conference on Application Specific Array Processors","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114761118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Data alignments for modular time-space mappings of BLAS-like algorithms","authors":"Hyuk-Jae Lee, J. Fortes","doi":"10.1109/ASAP.1995.522903","DOIUrl":"https://doi.org/10.1109/ASAP.1995.522903","url":null,"abstract":"Modular time-space transformations have been recently proposed for algorithm mappings that cannot be described by affine functions. This paper extends affine data alignments to a new class of data alignments, called expanded modular data alignments (EMDAs), for algorithms that are mapped by modular time-space transformations. An EMDA is a set of modular data alignments (MDAs) which are described by affine functions module a constant vector. With an EMDA, multiple copies of a data array are mapped into target processors by different modular data alignments (MDAs) and therefore can be efficiently used with modular time-space transformations which may require several operations to access the same data at the same time. Conditions of EMDAs that guarantee local access of data entries are provided. These conditions cover initial data alignment, data movement during the computation, and the number of copies required to avoid unnecessary communications. These conditions can be used to derive the EMDA for a given modular mapping or to generate a modular mapping for a given EMDA so that communication due to data misalignment does not occur. Several examples are given to show that EMDAs are well suited for modular time-space mappings.","PeriodicalId":354358,"journal":{"name":"Proceedings The International Conference on Application Specific Array Processors","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121806175","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Motion estimation algorithms on fine grain array processors","authors":"Heung-Nam Kim, M. J. Irwin, R. Owens","doi":"10.1109/ASAP.1995.522924","DOIUrl":"https://doi.org/10.1109/ASAP.1995.522924","url":null,"abstract":"Motion estimation plays a key role in video coding, (e.g., video telephone, MPEG, HDTV). Among the previous motion estimation algorithms, full-search block matching algorithms (BMA) are preferred because of their simplicity and lower control overhead when those algorithms are implemented in VLSI array processors. Previous full-search BMAs have considered one block matching at a time. There exist, however shared data in the search areas for adjacent template blocks. Therefore, if we process adjacent template blocks in parallel, we can reduce the data memory accesses for the shared data. In this paper we propose a new dataflow scheme for the efficient, systolic, full-search BMA on programmable array processors so that we can process as many adjacent template blocks as possible in unison in order to reduce the data memory accesses. We present an efficient implementation of the BMA on the Micro Grained Array Processor (MGAP) which is a fine-grained mesh-connected programmable VLSI array processor being developed at Penn State University. As a result, the BMA for the MPEG SIF video format (352/spl times/240 pixels) with a block size of 16/spl times/16 pixels, displacement range of 16 pixels, frame rate of 30 frames/sec can be computed at a real time processing rate on the MGAP.","PeriodicalId":354358,"journal":{"name":"Proceedings The International Conference on Application Specific Array Processors","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126877588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The systolic design of a block regularised parameter estimator using hierarchical signal flow graphs","authors":"D. W. Brown, F. Gaston","doi":"10.1109/ASAP.1995.522917","DOIUrl":"https://doi.org/10.1109/ASAP.1995.522917","url":null,"abstract":"Hierarchical Signal Flow Graphs (HSFGs) am used to illustrate the computations and the data flow required for the block regularised parameter estimation algorithm. This algorithm protects the parameter estimation from numerical difficulties associated with insufficiently exciting data or where the behaviour of the underlying model is unknown. Hierarchical signal flow graphs (HSFGs) aid the user's understanding of the algorithm as they clearly show how the algorithm differs from exponentially weighted recursive least squares, but also allow the user to develop fast efficient parallel algorithms easily and effectively, as demonstrated.","PeriodicalId":354358,"journal":{"name":"Proceedings The International Conference on Application Specific Array Processors","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129049928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R. Osorio, E. Antelo, J. Bruguera, J. Villalba, E. Zapata
{"title":"Digit on-line large radix CORDIC rotator","authors":"R. Osorio, E. Antelo, J. Bruguera, J. Villalba, E. Zapata","doi":"10.1109/ASAP.1995.522929","DOIUrl":"https://doi.org/10.1109/ASAP.1995.522929","url":null,"abstract":"Many applications figure the evaluation of rotations at high speeds. However there is a trade-off between the chip area and the latency. In this paper we develop a digit on-line pipelined array architecture based on the radix-4 CORDIC algorithm in rotation mode. The radix-4 CORDIC algorithm halves the number of microrotations with respect the traditionally radix-2 algorithm with the drawback of a non-constant scale factor. Seeking a good compromise between silicon area and latency we have used digit on-line processing. This way the data inputs the processor in blocks of bits (digits) in MSD-first mode of processing. We have used redundant carry-save arithmetic to allow carry-free additions and on-line processing. The designed processor demonstrates to have a better performance than previous digit on-line architectures.","PeriodicalId":354358,"journal":{"name":"Proceedings The International Conference on Application Specific Array Processors","volume":"130 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131774252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Villalba, José Antonio Hidalgo López, E. Zapata, E. Antelo, J. Bruguera
{"title":"CORDIC architectures with parallel compensation of the scale factor","authors":"J. Villalba, José Antonio Hidalgo López, E. Zapata, E. Antelo, J. Bruguera","doi":"10.1109/ASAP.1995.522930","DOIUrl":"https://doi.org/10.1109/ASAP.1995.522930","url":null,"abstract":"The compensation of scale factor imposes significant computation overhead on the CORDIC algorithm. In this paper we will propose two algorithms and architectures in order to perform the compensation of the scale factor in parallel with the computation of the CORDIC iterations. This way it is not necessary to carry out the final multiplication or add scaling iterations in order to achieve the compensation. With the architectures we propose the dependence on n of the compensation of the scale factor disappears, and this considerably reduces the latency of the system. The architectures developed are optimized solutions for the different operating modes of the CORDIC both in conventional and in redundant arithmetic.","PeriodicalId":354358,"journal":{"name":"Proceedings The International Conference on Application Specific Array Processors","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134628285","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Parallel sequence comparison and alignment","authors":"R. Hughey","doi":"10.1109/ASAP.1995.522916","DOIUrl":"https://doi.org/10.1109/ASAP.1995.522916","url":null,"abstract":"Sequence comparisons, a vital research tool in computational biology, is based on a simple O(n/sup 2/) algorithm that easily maps to a linear array of processors. This paper reviews and compares high-performance sequence analysis on general-purpose supercomputers and single-purpose, reconfigurable, and programmable co-processors. The difficulty of comparing hardware from published performance figures is also noted.","PeriodicalId":354358,"journal":{"name":"Proceedings The International Conference on Application Specific Array Processors","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123672074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Systolic filter for fast DNA similarity search","authors":"P. Guerdoux-Jamet, D. Lavenier","doi":"10.1109/ASAP.1995.522918","DOIUrl":"https://doi.org/10.1109/ASAP.1995.522918","url":null,"abstract":"This paper presents a systolic filter for speeding up the scan of DNA databases. The filter acts as a co-processor which performs the more intensive computations occurring during the process. Our validation, based on a FPGA prototype board tightly connected to a workstation, has shown that the filter may boost the performance of the machine by a factor ranging from 50 to 400 over current workstations.","PeriodicalId":354358,"journal":{"name":"Proceedings The International Conference on Application Specific Array Processors","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114928301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A scalable halftoning coprocessor architecture","authors":"A. Kugler, R. Hersch","doi":"10.1109/ASAP.1995.522907","DOIUrl":"https://doi.org/10.1109/ASAP.1995.522907","url":null,"abstract":"Exact-angle superscreen dithering requires large dither tiles. Since storing precomputed screen elements for each intensity level would require too much memory, dithering must be executed on the fly at halftoning time. For this purpose a dithering coprocessor is presented which generates halftoned images at high speed. The proposed hardware architecture is based on a pipelined and scalable design which speeds up halftoning by a factor of twenty compared with modern RISC software-based solutions. We describe the architecture of the coprocessor and show to what extent it can be scaled for improving performances. The proposed coprocessor could find applications in digital color copiers which need to print scanned color images at high speed.","PeriodicalId":354358,"journal":{"name":"Proceedings The International Conference on Application Specific Array Processors","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114957411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}