{"title":"Hierarchical specification and transformation of algorithms and VLSI-architectures","authors":"U. Arzt, L. Thiele","doi":"10.1109/VLSISP.1994.574729","DOIUrl":"https://doi.org/10.1109/VLSISP.1994.574729","url":null,"abstract":"In this paper the specification and processing of data (type) hierarchy is integrated in the transformative approach of the design of application-specific circuits. The processing of the type hierarchy is solved by program transformations, which can be proven correct.","PeriodicalId":427356,"journal":{"name":"Proceedings of 1994 IEEE Workshop on VLSI Signal Processing","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124884303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A new postprocessing architecture for soft output Viterbi decoding","authors":"O. Joeressen, H. Meyr","doi":"10.1109/VLSISP.1994.574758","DOIUrl":"https://doi.org/10.1109/VLSISP.1994.574758","url":null,"abstract":"Soft output Viterbi decoding has evolved as a key technology for advanced decoding systems during the recent years. The article presents a modification of the soft output Viterbi algorithm and the resulting hardware architecture. The new approach leads to storage savings while maintaining the low computational complexity of former approaches and is thus advantageous for hardware as well as for software implementations. The complexity of soft output Viterbi decoding with the new approach is clearly less than twice that of hard decision Viterbi decoding.","PeriodicalId":427356,"journal":{"name":"Proceedings of 1994 IEEE Workshop on VLSI Signal Processing","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121144043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Distributed memory and control VLSI architectures for the 1-D Discrete Wavelet Transform","authors":"J. Fridman, E. Manolakos","doi":"10.1109/VLSISP.1994.574763","DOIUrl":"https://doi.org/10.1109/VLSISP.1994.574763","url":null,"abstract":"We address the synthesis of fast, efficient and regular computational structures for the Discrete Wavelet Transform (DWT) algorithm, using linear space-time mapping and constraint driven localization techniques. Index space transformations are used to regularize the DWT algorithm and to avoid data collisions due to multiprojection. A summary of the data dependence and localization analysis is presented, as well as an array of L Processing Elements (PEs) for computing any J-octave DWT decomposition with latency of M, where L is the wavelet filter length and M is the input sequence length. The latency is independent of the highest computable octave J, for any value of J, and the efficiency is nearly optimal and independent of M. The proposed design is the fastest parallel implementation of the 1-D DWT with L PEs that we know of.","PeriodicalId":427356,"journal":{"name":"Proceedings of 1994 IEEE Workshop on VLSI Signal Processing","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125758619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Implementation of parallel image processing algorithms in the Cloner environment","authors":"J. N. Patel, A. Khokhar, L. Jamieson","doi":"10.1109/VLSISP.1994.574733","DOIUrl":"https://doi.org/10.1109/VLSISP.1994.574733","url":null,"abstract":"Cloner is a prototyping environment for computer vision and image processing (CVIP) algorithms and tasks. It is being designed to allow users to take advantage of the computing power provided by parallel processing systems without requiring an extensive understanding of the underlying architecture. In this paper, we focus on the use of Cloner to achieve high-performance implementations for a class of low-level CVIP algorithms.","PeriodicalId":427356,"journal":{"name":"Proceedings of 1994 IEEE Workshop on VLSI Signal Processing","volume":"128 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132333110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Scalable architectures for high speed channel decoding","authors":"H. Dawid, H. Meyr","doi":"10.1109/VLSISP.1994.574747","DOIUrl":"https://doi.org/10.1109/VLSISP.1994.574747","url":null,"abstract":"At present, channel decoding and soft output channel decoding of convolutional codes are key technologies for advanced communication systems. The speed of any implementation of the corresponding decoding algorithms, the Viterbi algorithm (VA) and the soft output VA (SOVA) is limited by an inherent nonlinear recursion. In contrast this paper deals with scalable architectures for purely feedforward decoding algorithms, the \"minimized method\" parallelized Viterbi decoding algorithm and the parallel MAP (Maximum A Posteriori) soft output decoding algorithm. A unified treatment is possible since these algorithms and the corresponding dependence graphs (DGs) are very similar. In order to obtain a scalable throughput adapted to a given specification, a hierarchical resource sharing methodology exploiting the inherent DG regularity is proposed and implemented as a VHDL generator.","PeriodicalId":427356,"journal":{"name":"Proceedings of 1994 IEEE Workshop on VLSI Signal Processing","volume":"135 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132270306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"2-D discrete cosine transforms on a fine grain array processor","authors":"Heung-Nam Kim, M. Borah, R. Owens, M. J. Irwin","doi":"10.1109/VLSISP.1994.574760","DOIUrl":"https://doi.org/10.1109/VLSISP.1994.574760","url":null,"abstract":"The 2-D DCT has been an industry standard in image data compression. Since its first introduction, a number of fast algorithms and technique have been introduced. Most of them were implemented using specialized VLSI chips. In this paper we present an efficient systolic 2-D DCT algorithm on a 2-D mesh fine-grained array processor. Our algorithm reads non-skewed input subimages and generates the output in non-skewed form with only a small amount of extra processors. It uses the minimum number of multiplications by employing modified small n algorithms. Our implementation of the 2-D DCT on the Micro Grained Array Processor (MGAP), which is a fine-grained and mesh-connected array processor being developed at the Penn State University, exploits massive parallelism. As a result the 2-D DCT of size 8/spl times/8 and 16/spl times/16 pixels for 256/spl times/256 pixel images can be computed at real time processing rates.","PeriodicalId":427356,"journal":{"name":"Proceedings of 1994 IEEE Workshop on VLSI Signal Processing","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115727109","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"What is a multirate array?","authors":"P. Lenders, S. Rajopadhye","doi":"10.1109/VLSISP.1994.574768","DOIUrl":"https://doi.org/10.1109/VLSISP.1994.574768","url":null,"abstract":"Multirate arrays (MRAs) are an extension of systolic arrays where different data variables are propagated with different clocks. Recently, synthesis methods for MRA synthesis, starting from Affine Recurrence Equations (AREs) have been proposed. In this paper we give a formal definition of MRAs as systems of Uniform Recurrence Equations (UREs) defined over sparse polyhedral domains. We then show a direct equivalence between the previously proposed synthesis methods and a simple index transformation of sparse UREs.","PeriodicalId":427356,"journal":{"name":"Proceedings of 1994 IEEE Workshop on VLSI Signal Processing","volume":"136 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123446781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Performance analysis of mixed asynchronous synchronous systems","authors":"J. Teich, S. Sriram, L. Thiele, M. Martín","doi":"10.1109/VLSISP.1994.574735","DOIUrl":"https://doi.org/10.1109/VLSISP.1994.574735","url":null,"abstract":"The paper is concerned with the timing analysis of a class of digital systems called mixed asynchronous-synchronous systems. In such a system, each computation module is either synchronous (i.e. clocked) or asynchronous (i.e. selftimed). The communication between modules is assumed to be selftimed for all modules. We introduce a graph model called MASS for describing the timing behaviour of such architectures. The graph contains two kinds of nodes, synchronous and asynchronous nodes. The operation model of a MASS is similar to that of a timed marked graph, however, additional schedule constraints are imposed on synchronous nodes: A synchronous node can only fire at ticks of its local module clock. We analyze the behaviour of MASS, in particular period, periodicity and maximal throughput rate.","PeriodicalId":427356,"journal":{"name":"Proceedings of 1994 IEEE Workshop on VLSI Signal Processing","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127372688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multiprocessor scheduling with a priori node assignment","authors":"V. Zivojnovic, H. Koerner, H. Meyr","doi":"10.1109/VLSISP.1994.574739","DOIUrl":"https://doi.org/10.1109/VLSISP.1994.574739","url":null,"abstract":"Compile-time scheduling of DSP programs on multiprocessor systems is discussed. Contrary to standard approaches, a complete, a priori node assignment is supposed. The assumption is justified for coarse-grain DSP programs on heterogeneous programmable architectures with dedicated memory, I/O or accelerator units. The a priori information about cut arcs is used to apply the retiming transformation for the minimization of the schedule length. Experimental results show that the obtained improvement is worth the additional complexity which is introduced by retiming. At the end, issues related to the implementation of retimed DSP programs are discussed.","PeriodicalId":427356,"journal":{"name":"Proceedings of 1994 IEEE Workshop on VLSI Signal Processing","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132505800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Area-time high level synthesis laws: theory and practice","authors":"M. Potkonjak, J. Rabaey","doi":"10.1109/VLSISP.1994.574730","DOIUrl":"https://doi.org/10.1109/VLSISP.1994.574730","url":null,"abstract":"We introduce three AT DSP high level synthesis laws that relate different components of the area of ASIC implementation cost, namely foreground memory, execution units, and interconnect to the sampling period (available time). The laws state that: A=const, AT=const, and AT/sup 2/=const for the area of registers, execution units, and interconnect respectively. We validate the AT laws using case studies and statistical analysis of synthesis results of 80 real life designs. Several applications of the AT laws for development of high level synthesis tools are presented. Use of the AT high level synthesis laws as an effective method for encapsulation of high level synthesis knowledge is also studied, The effectiveness of the AT laws applications is documented on numerous designs.","PeriodicalId":427356,"journal":{"name":"Proceedings of 1994 IEEE Workshop on VLSI Signal Processing","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121114323","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}