{"title":"Hardware design of a Hough transform based 2-D motion estimation system","authors":"Hsiang-Ling Li, C. Chakrabarti","doi":"10.1109/VLSISP.1996.558368","DOIUrl":"https://doi.org/10.1109/VLSISP.1996.558368","url":null,"abstract":"A novel feature-domain 2D motion estimation system based on the straight-line Hough transform (SLHT) is presented. This system implements the motion estimation technique proposed by Li and Chakrabarti (see Pattern Recognition, vol.29, no.8, 1996). It operates on 256/spl times/256-pixel binary images and consists of two main blocks. The first block does the preprocessing work including smoothing the boundary, tracing and integrating the contours, and detecting dominant points. The second block computes the Hough transform on contour segments as well as the rotation and translation parameters. Each of the modules has been implemented (gate level) and simulated using Mentor Graphics tools. The experimental results are presented and compared with the results of the software implementation.","PeriodicalId":290885,"journal":{"name":"VLSI Signal Processing, IX","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129969675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Model-based architectural design and verification of scalable embedded DSP systems-a RASSP approach","authors":"Lan-Rong Dung, V. K. Madisetti, J. Hines","doi":"10.1109/VLSISP.1996.558314","DOIUrl":"https://doi.org/10.1109/VLSISP.1996.558314","url":null,"abstract":"The paper describes how rapid model-year architectural synthesis (e.g., HW/SW codesign) of embedded signal processors can be performed to optimize various cost objective functions using a reuse library of model, followed by simulation based optimization. Sponsored as part of DARPA's RASSP program, this approach has developed and released a number of interoperable and verified architectural component libraries at the system level (processors, communication protocols, and topologies). While these libraries have been used in actual demonstrations of avionics and military systems, such as the MIT Lincoln Laboratory's SAR Benchmark, the F-14 legacy Infrared Search and Track System (IRST), and as part of NASA/JPL's Remote Exploration/Experimentation (REE) program studies, the authors introduce the methodology of conceptual prototyping and establish the requirements and features of the proposed environment. They also illustrate its use on some common applications with relatively sophisticated architectural building blocks, such as IEEE SCI protocol and Analog Devices' SHARC processor family.","PeriodicalId":290885,"journal":{"name":"VLSI Signal Processing, IX","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134391300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Design issues for very-long-instruction-word VLSI video signal processors","authors":"S. Dutta, A. Wolfe, W. Wolf, K. O'Connor","doi":"10.1109/VLSISP.1996.558307","DOIUrl":"https://doi.org/10.1109/VLSISP.1996.558307","url":null,"abstract":"This paper is a design study of a very long instruction word (VLIW) video signal processor (VSP), concentrating on the VLSI tradeoffs which affect the processor's architecture. VLIW architectures provide high parallelism and excellent high-level language programmability, but require careful attention to VLSI design. Flexible, high-bandwidth interconnect, high-connectivity register files, and fast cycle time are required to achieve real-time video signal processing. The design targets 32-64 operations per cycle at clock rates exceeding 500 MHz. Parameterizable versions of key modules have been designed in a 0.25 /spl mu/m CMOS process, allowing us to explore the VLIW VSP design space and study the tradeoffs defined by the characteristics of the process.","PeriodicalId":290885,"journal":{"name":"VLSI Signal Processing, IX","volume":"114 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130933924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient VLSI suited architectures for discrete wavelet transforms","authors":"S. Simon, P. Rieder, J. Nossek","doi":"10.1109/VLSISP.1996.558371","DOIUrl":"https://doi.org/10.1109/VLSISP.1996.558371","url":null,"abstract":"A variety of architectures for the discrete wavelet transform (DWT) is examined to derive an efficient VLSI implementation. The comparison leads to a lattice filter structure which uses single steps of the CORDIC algorithm. Due to the modular structure of the proposed architecture, this approach is especially suited for full custom design style using module generators to automate the manual design process.","PeriodicalId":290885,"journal":{"name":"VLSI Signal Processing, IX","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133285091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A scalable architecture for 2-D discrete wavelet transform","authors":"J.C. Limqueco, M. Bayoumi","doi":"10.1109/VLSISP.1996.558369","DOIUrl":"https://doi.org/10.1109/VLSISP.1996.558369","url":null,"abstract":"We propose an efficient and simple systolic-like architecture for VLSI implementation of a 2-D discrete wavelet transform (DWT). The \"approximation\" and \"detailed\" components of a signal are computed simultaneously in the first octave and alternately in the other octave(s). Each processing element has its own local memory for storing intermediate data and minimum routing requirement limited only to its neighbors. The proposed architecture uses the same clock frequency for every octave level and has a 100% utilization for j=2 architecture, and N/sup 2/+N period cycle. The architecture is scalable for different filter lengths (divisible by 2) and different octave levels.","PeriodicalId":290885,"journal":{"name":"VLSI Signal Processing, IX","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132273168","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An object based data cache with conflict free concurrent access as shared memory for a parallel DSP","authors":"J. Kneip, P. Pirsch","doi":"10.1109/VLSISP.1996.558278","DOIUrl":"https://doi.org/10.1109/VLSISP.1996.558278","url":null,"abstract":"The paper describes principle and practical implementation of an object based cache concept, allowing conflict free regular access to data structures for a cluster of processing units. The cache is based on a virtual object bound address space instead of the conventional linear address space for the access to shared data located in on-chip caches. By extending the conventional block based cache principle to 2-D blocks and using virtual addresses for address arithmetic and hit/miss detection, the time critical address calculations in the load/store pipeline can be performed fast and at low hardware cost. Transform to physical addresses is performed during block transfer between internal caches and external system memory, where it is much less time critical and must only be performed once per block. The object based cache is compiler friendly, fully transparent to the programmer, and allows the hardware efficient implementation of a shared on-chip memory system for future parallel digital image processors.","PeriodicalId":290885,"journal":{"name":"VLSI Signal Processing, IX","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114946878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Low-power digital filter implementations using ternary coefficients","authors":"R. Hezar, V. K. Madisetti","doi":"10.1109/VLSISP.1996.558325","DOIUrl":"https://doi.org/10.1109/VLSISP.1996.558325","url":null,"abstract":"We propose an efficient design procedure for digital FIR filters whose coefficients are restricted to the ternary set (-1, 0, +1), cascaded by a multiplication-free architecture. A dynamic programming algorithm, minimizing the instantaneous error, is also proposed to assist in the search for the optimal ternary filter coefficient set. Power reductions in a VLSI implementation appear feasible, when compared to other published approaches.","PeriodicalId":290885,"journal":{"name":"VLSI Signal Processing, IX","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127038858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
K. Nadehara, H. Stolberg, M. Ikekawa, E. Murata, I. Kuroda
{"title":"Real-time software MPEG-1 video decoder design for low-cost, low-power applications","authors":"K. Nadehara, H. Stolberg, M. Ikekawa, E. Murata, I. Kuroda","doi":"10.1109/VLSISP.1996.558376","DOIUrl":"https://doi.org/10.1109/VLSISP.1996.558376","url":null,"abstract":"This paper presents a real-time MPEC-1 video decoder implemented in software on a DSP-enhanced, 160-mW, 100-MHz, 32-bit microprocessor. The processor's DSP-oriented instructions improves the performance of generic DSP operations such as the inverse discrete cosine transform, while fast software algorithms that perform parallel operation on packed-pixel data are developed for processes unique to video decoding such as motion compensation. Furthermore, to reduce the clock count as well as the instruction count, load/store scheduling and cache miss reduction are performed. In total, the processor can achieve 30 frames/sec MPEC-1 video decoding at a cost and power dissipation (160 mW) comparable to dedicated LSIs.","PeriodicalId":290885,"journal":{"name":"VLSI Signal Processing, IX","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121065272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"High-radix parallel dividers for VLSI signal processing","authors":"T. Aoki, Hiroshi Tokoyo, T. Higuchi","doi":"10.1109/VLSISP.1996.558306","DOIUrl":"https://doi.org/10.1109/VLSISP.1996.558306","url":null,"abstract":"This paper presents a unified approach for designing high-radix dividers for on-line signal and data processing applications. It has long been recognized that the use of higher radices makes possible the reduction of computational steps in the division process. However most of the conventional high-radix algorithms are not suited for designing high-speed parallel dividers since they require lookup tables for selecting the quotient digits. We present a high-radix divider design that does not assume the use of lookup tables and is applicable to arbitrary radices. By prescaling the operands and converting the representation of each partial remainder into partially non-redundant representation, the quotient digit can be obtained directly from the integer part of the partial remainder. This paper also discusses the design of a radix-8 fully parallel divider as an example.","PeriodicalId":290885,"journal":{"name":"VLSI Signal Processing, IX","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121066585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"LISA-machine description language and generic machine model for HW/SW co-design","authors":"V. Zivojnovic, S. Pees, Heinrich Meyr","doi":"10.1109/VLSISP.1996.558311","DOIUrl":"https://doi.org/10.1109/VLSISP.1996.558311","url":null,"abstract":"A machine description language is presented. The language, LISA, and its generic machine model are able to produce bit- and cycle/phase-accurate processor models covering the specific needs of HW/SW codesign, and cosimulation environments. The development of a new language was necessary in order to cover the gap between coarse ISA models used in compilers, and instruction set simulators on the one hand, and detailed models used for hardware design on the other. The main part of the paper is devoted to behavioral pipeline modeling. The pipeline controller of the generic machine model is represented as an ASAP (as soon as possible) sequencer parameterized by precedence and resource constraints of operations of each instruction. The standard pipeline description based on reservation tables and Gantt charts was extended by additional operation descriptors which enable the detection of data and control hazards, and permit modeling of pipeline flushes. Using the newly introduced L-charts we reduced the parameterization of the pipeline controller to a minimum and at the same time covered typical pipeline controls found in state of the art signal processors. As an example, the application of the LISA model on the TI-TMS320C54x signal processor is presented.","PeriodicalId":290885,"journal":{"name":"VLSI Signal Processing, IX","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123235121","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}