{"title":"A scalable VLSI architecture for multichannel blind deconvolution and source separation","authors":"H. Pan, D. Xia, S. Douglas, K.F. Smith","doi":"10.1109/SIPS.1998.715792","DOIUrl":"https://doi.org/10.1109/SIPS.1998.715792","url":null,"abstract":"In this paper, we describe a scalable VLSI architecture for a signal processing system that separates multiple independent source signals from a set of linear, convolved mixtures. The architecture employs a recently-proposed entropy-based algorithm and consists of a two dimensional array of interconnected chips, each of which implements a two-input, two-output signal separation system. With a maximum of 255 filter coefficients per input-output channel. Chip communication is realized via separate state machines within each chip to simplify the design and enable its scalability to larger tasks. An application of the architecture to speech separation is described.","PeriodicalId":151031,"journal":{"name":"1998 IEEE Workshop on Signal Processing Systems. SIPS 98. Design and Implementation (Cat. No.98TH8374)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131809126","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Javier Valls, Marcos M Peiró, T. Sansaloni, Eduardo Boemo
{"title":"A study about FPGA-based digital filters","authors":"Javier Valls, Marcos M Peiró, T. Sansaloni, Eduardo Boemo","doi":"10.1109/SIPS.1998.715782","DOIUrl":"https://doi.org/10.1109/SIPS.1998.715782","url":null,"abstract":"A set of operators suitable for digit-serial FIR filtering is presented. The canonical and inverted forms are studied. In each of these structures both the symmetrical and anti-symmetrical particular cases are also covered. All circuits have been implemented using an EPF10K50 Altera FPGA. The main results show that the canonical form presents less occupation and higher throughput. The 8-tap filter versions implemented can be applied in real-time processing with sample rate ranging up to 7 MHz using the bit-serial versions and up to 25 MHz with the bit-parallel ones.","PeriodicalId":151031,"journal":{"name":"1998 IEEE Workshop on Signal Processing Systems. SIPS 98. Design and Implementation (Cat. No.98TH8374)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130474284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Kneip, Sven Bauer, J. Vollmer, B. Schmale, P. Kuhn, M. Reissmann
{"title":"The MPEG-4 video coding standard-a VLSI point of view","authors":"J. Kneip, Sven Bauer, J. Vollmer, B. Schmale, P. Kuhn, M. Reissmann","doi":"10.1109/SIPS.1998.715767","DOIUrl":"https://doi.org/10.1109/SIPS.1998.715767","url":null,"abstract":"The paper presents an overview of the current status of the emerging MPEG-4 video coding standard and a discussion of the potential and problems for a practical implementation. Though the high flexibility of the standard suggests a software implementation on microprocessors or DSP, a complexity analysis of the standard proved, that the required processing power for a real time codec implementation quickly reaches the limits even of future high-performance microprocessors. But even with its high number of different algorithms, the standard leaves enough design space for a successful implementation as an optimised, but flexible low-cost, low-power solution. By identifying common arithmetic and transfer properties of the algorithms involved, a partitioning into a stream, video, and composition processor is proposed. Each of the units is programmable, but dedicated to the typical requirements of each algorithm class.","PeriodicalId":151031,"journal":{"name":"1998 IEEE Workshop on Signal Processing Systems. SIPS 98. Design and Implementation (Cat. No.98TH8374)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132923891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Word-length optimization for high-level synthesis of digital signal processing systems","authors":"Ki-Il Kum, Wonyong Sung","doi":"10.1109/SIPS.1998.715819","DOIUrl":"https://doi.org/10.1109/SIPS.1998.715819","url":null,"abstract":"Word-length optimization software is developed not only to reduce the hardware cost but also to minimize the optimization time. It inserts quantizers to a data flow graph representation, partitions the resultant graph, determines the minimum required word-length for each partitioned signal, conducts scheduling and binding using the minimum word-length information, and finally optimizes the word-lengths of functional units. Fixed-point simulation results are used as for the performance measure, thus this method can be applied to nonlinear and time-varying algorithms. Although this approach requires iterative fixed-point simulations, the search space is reduced significantly by grouping signals using the high-level synthesis, or hardware sharing, results. A fourth-order IIR filter, a fifth-order elliptic filter, and a 12th-order adaptive LMS filter are implemented using this software. The hardware cost of functional units is reduced by 25% in the IIR filter and 7% in the elliptic filter compared to the previous results.","PeriodicalId":151031,"journal":{"name":"1998 IEEE Workshop on Signal Processing Systems. SIPS 98. Design and Implementation (Cat. No.98TH8374)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131122949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ichiro Kuroda, Eri Murata, Kouhei, Nadehara, Kazumasa Suzukit, T. Arai, Atsushi Okamurat
{"title":"A 16-bit parallel MAC architecture for a multimedia RISC processor","authors":"Ichiro Kuroda, Eri Murata, Kouhei, Nadehara, Kazumasa Suzukit, T. Arai, Atsushi Okamurat","doi":"10.1109/SIPS.1998.715773","DOIUrl":"https://doi.org/10.1109/SIPS.1998.715773","url":null,"abstract":"This paper presents a parallel MAC (multiply-accumulation) architecture designed for DSP applications on a 200-MHz, 1.6-GOPS multimedia RISC processor. The datapath architecture of the processor is designed to realize parallel execution of a data transfer and SIMD parallel arithmetic operations. SIMD parallel 16-bit MAC instructions are introduced with a symmetric rounding scheme which maximizes the accuracy of the 18-bit accumulation. This parallel 16-bit MAC instruction on a 64-bit datapath is shown to be efficiently utilized for DSP applications such as convolution in the multimedia RISC processor. By using the parallel MAC instruction with the symmetric rounding scheme, the two-dimensional inverse discrete cosine transform (2D-IDCT) which satisfies IEEE 1180 can be implemented in 202 cycles.","PeriodicalId":151031,"journal":{"name":"1998 IEEE Workshop on Signal Processing Systems. SIPS 98. Design and Implementation (Cat. No.98TH8374)","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134283488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Loop scheduling algorithm for timing and memory operation minimization with register constraint","authors":"F. Chen, S. Tongsima, E. Sha","doi":"10.1109/SIPS.1998.715820","DOIUrl":"https://doi.org/10.1109/SIPS.1998.715820","url":null,"abstract":"We present a novel scheduling framework, called memory operation minimization rotation scheduling (MORS), for scheduling multi-dimensional applications subject to register constraints and other resource constraints. Under such constraints, MORS strives to shorten the schedule length while minimally inserting the load and store operations in the schedule to reduce the register requirement pressure. Experiments show that our approach is useful for reducing the schedule length without violating the register constraint of a target machine. Furthermore, the average reduction in the schedule length produced by these experiments reaches 36.6%.","PeriodicalId":151031,"journal":{"name":"1998 IEEE Workshop on Signal Processing Systems. SIPS 98. Design and Implementation (Cat. No.98TH8374)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131766445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Computationally efficient fast algorithm and architecture for the IFFT/FFT in DMT/OFDM systems","authors":"A.-Y. Wu, Tsun-Shan Chan","doi":"10.1109/SIPS.1998.715798","DOIUrl":"https://doi.org/10.1109/SIPS.1998.715798","url":null,"abstract":"The discrete multitone (DMT) modulation/demodulation scheme is the standard transmission technique in the application of asymmetric digital subscriber lines (ADSL). Although the DMT can achieve rate-adaptive data transmission compared with other modulation/demodulation schemes, its computational complexity is too high for cost-efficient implementations. For example, it requires 512-point IFFT/FFT as the modulation/demodulation kernel. The large block size results in heavy computational load in running programmable digital signal processors (DSP). We derive a computationally efficient fast algorithm for the IFFT/FFT. The proposed algorithm requires only 22% of the multiplications needed with the conventional butterfly approach. Also, it can avoid complex-domain operations that are inevitable in conventional IFFT/FFT computation. The resulting software function requires less MIPS count and program storage in firmware development of the IFFT/FFT module. Hence, it is very suitable for DSP-based DMT implementation. The proposed algorithm can also be applied to the technology of orthogonal frequency division multiplexing (OFDM) which is the processing kernel of the digital audio/video broadcasting (DAB/DVB) systems.","PeriodicalId":151031,"journal":{"name":"1998 IEEE Workshop on Signal Processing Systems. SIPS 98. Design and Implementation (Cat. No.98TH8374)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133439291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Using DG2VHDL to synthesize an FPGA implementation of the 1-D discrete wavelet transform","authors":"A. Stone, E. Manolakos","doi":"10.1109/SIPS.1998.715811","DOIUrl":"https://doi.org/10.1109/SIPS.1998.715811","url":null,"abstract":"We introduce DG2VHDL, a design tool which bridges the gap between an abstract graphical description of a DSP algorithm and its concrete hardware description language (HDL) representation. DG2VHDL automatically translates a dependence graph (DG) into a synthesizable, behavioral VHDL entity that can be input to industrial-strength behavioral compilers for producing silicon implementations of the algorithm (FPGA, ASIC). The discrete wavelet transform (DWT) was selected to demonstrate that the tool facilitates the rapid prototyping of modular parallel structures for non-trivial algorithms with non-regular data dependency structure. In addition, the DWT is an important algorithm for data compression and feature extraction, among many other real-time DSP applications. We demonstrate here that the behavioral VHDL code produced automatically by the tool leads, after behavioral synthesis, to an efficient distributed memory and control modular array architecture which can be embedded into a single FPGA.","PeriodicalId":151031,"journal":{"name":"1998 IEEE Workshop on Signal Processing Systems. SIPS 98. Design and Implementation (Cat. No.98TH8374)","volume":"34 11","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120986539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Low-power GPS receiver design","authors":"T. Meng","doi":"10.1109/SIPS.1998.715763","DOIUrl":"https://doi.org/10.1109/SIPS.1998.715763","url":null,"abstract":"This paper describes the design of a low-power global positioning system (GPS) receiver implemented in CMOS technology. The primary GPS ranging signal is broadcast at a frequency of 1.575 GHz, modulated by a pseudo-noise sequence at a chip rate of 1 MHz. The design of this low-power GPS receiver emphasizes the circuit techniques and architectural trade-offs employed in minimizing the energy needed for each position estimate.","PeriodicalId":151031,"journal":{"name":"1998 IEEE Workshop on Signal Processing Systems. SIPS 98. Design and Implementation (Cat. No.98TH8374)","volume":"155 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123920679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Design and implementation of low-power DCT chip for portable multimedia terminals","authors":"L.-G. Chen, Jiun-Ying Jiu, H. Chang","doi":"10.1109/SIPS.1998.715771","DOIUrl":"https://doi.org/10.1109/SIPS.1998.715771","url":null,"abstract":"This paper describes the design and implementation of a low power 2D DCT chip for portable multimedia terminals. The chip architecture based on direct 2D approach reduces computational complexity and the power dissipation can be reduced accordingly. In the implementation of the direct 2D algorithm, a parallel distributed arithmetic (DA) architecture at reduced supply voltage is adopted. In the real circuit implementation of the chip, an adder of low power consumption is designed, as well as a power-saving ROM and a low-voltage two-port SRAM with sequential access. The resultant 2D DCT chip is realized by 0.6 /spl mu/m single-poly double-metal technology. Critical path simulation indicates a maximum input rate of 133 MHz, and it consumes 138 mW at 100 MHz. The measured chip speed is around 100 MHz.","PeriodicalId":151031,"journal":{"name":"1998 IEEE Workshop on Signal Processing Systems. SIPS 98. Design and Implementation (Cat. No.98TH8374)","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126071047","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}