{"title":"A scalable halftoning coprocessor architecture","authors":"A. Kugler, R. Hersch","doi":"10.1109/ASAP.1995.522907","DOIUrl":"https://doi.org/10.1109/ASAP.1995.522907","url":null,"abstract":"Exact-angle superscreen dithering requires large dither tiles. Since storing precomputed screen elements for each intensity level would require too much memory, dithering must be executed on the fly at halftoning time. For this purpose a dithering coprocessor is presented which generates halftoned images at high speed. The proposed hardware architecture is based on a pipelined and scalable design which speeds up halftoning by a factor of twenty compared with modern RISC software-based solutions. We describe the architecture of the coprocessor and show to what extent it can be scaled for improving performances. The proposed coprocessor could find applications in digital color copiers which need to print scanned color images at high speed.","PeriodicalId":354358,"journal":{"name":"Proceedings The International Conference on Application Specific Array Processors","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114957411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Systolic filter for fast DNA similarity search","authors":"P. Guerdoux-Jamet, D. Lavenier","doi":"10.1109/ASAP.1995.522918","DOIUrl":"https://doi.org/10.1109/ASAP.1995.522918","url":null,"abstract":"This paper presents a systolic filter for speeding up the scan of DNA databases. The filter acts as a co-processor which performs the more intensive computations occurring during the process. Our validation, based on a FPGA prototype board tightly connected to a workstation, has shown that the filter may boost the performance of the machine by a factor ranging from 50 to 400 over current workstations.","PeriodicalId":354358,"journal":{"name":"Proceedings The International Conference on Application Specific Array Processors","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114928301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A simple array processor for binary prefix sums","authors":"R. Lin, S. Olariu","doi":"10.1109/ASAP.1995.522911","DOIUrl":"https://doi.org/10.1109/ASAP.1995.522911","url":null,"abstract":"The task of computing the prefix sums of a binary sequence (BPS, for short) arises frequently in expression evaluation, data and storage compaction, routing, processor assignment, and operating system design. The main goal of this work is to propose an efficient special-purpose architecture for the BPS problem. Our design exploits a novel and elegant idea that allows us to considerably reduce the number of processors of the best-known design. The resulting design is simple and intuitive and scales easily to handle input sequences of various sizes.","PeriodicalId":354358,"journal":{"name":"Proceedings The International Conference on Application Specific Array Processors","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124940945","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The naive execution of affine recurrence equations","authors":"D. Wilde, S. Rajopadhye","doi":"10.1109/ASAP.1995.522900","DOIUrl":"https://doi.org/10.1109/ASAP.1995.522900","url":null,"abstract":"In recognition of the fundamental relation between regular arrays and systems of affine recurrence equations, the ALPHA language was developed as the basis of a computer aided design methodology for regular array architectures. ALPHA is used to initially specify algorithms at a very high algorithmic level. Regular array architectures can then be derived from the algorithmic specification using a transformational approach supported by the ALPHA environment. This design methodology guarantees the final design to be correct by construction, assuming the initial algorithm was correct. In this paper, we address the problem of validating an initial specification. We demonstrate a translation methodology which compiles ALPHA into the imperative sequential language C. The C-code may then be compiled and executed to test the specification. We show how an ALPHA program can be naively implemented by viewing it as a set of monolithic arrays and their filing functions, implemented using applicative caching. This is the approach which is used by the translator. We discuss two problems that had to be solved before implementing the translator. The first is how to allocate 1-dimensional storage for a polyhedron, and the second is how to scan a polyhedron with nested loops.","PeriodicalId":354358,"journal":{"name":"Proceedings The International Conference on Application Specific Array Processors","volume":"151 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114378733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Minimizing synchronization overhead in statically scheduled multiprocessor systems","authors":"S. Bhattacharyya, S. Sriram, Edward A. Lee","doi":"10.1109/ASAP.1995.522934","DOIUrl":"https://doi.org/10.1109/ASAP.1995.522934","url":null,"abstract":"Synchronization overhead can significantly degrade performance in embedded multiprocessor systems. This paper develops techniques to determine a minimal set of processor synchronizations that are essential for correct execution in an embedded multiprocessor implementation. Our study is based in the context of self-timed execution of iterative dataflow programs; dataflow programming in this form has been applied extensively, particularly in the context of signal processing software. Self-timed execution refers to a combined compile-time/run-time scheduling strategy in which processors synchronize with one another only based on inter-processor communication requirements, and thus, synchronization of processors at the end of each loop iteration does not generally occur. We introduce a new graph-theoretic framework, based on a data structure called the synchronization graph, for analyzing and optimizing synchronization overhead in self-timed, iterative dataflow programs. We also present an optimization that involves converting a synchronization graph that is not strongly connected into a strongly connected graph.","PeriodicalId":354358,"journal":{"name":"Proceedings The International Conference on Application Specific Array Processors","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125642153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Column compression pipelined multipliers","authors":"L. Breveglieri, L. Dadda, V. Piuri","doi":"10.1109/ASAP.1995.522909","DOIUrl":"https://doi.org/10.1109/ASAP.1995.522909","url":null,"abstract":"The paper presents a study on the introduction of pipelining in parallel VLSI multipliers, built according to the column compression (CC) design techniques. A number of CC multiplier schemes have been proposed in the literature, aimed at reducing the number of stages of adders necessary to compute a multiplication. More recently CC multiplier schemes aimed at optimising the required silicon area, the regularity and the locality of the interconnections among the adders, have been proposed. The paper affords the introduction of pipelining in these last structures and compares the obtained results with existing structures, in terms of required number of components and operation frequency.","PeriodicalId":354358,"journal":{"name":"Proceedings The International Conference on Application Specific Array Processors","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134512726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"VLSI algorithms for compressed pattern search using tree based codes","authors":"A. Mukherjee, T. Acharya","doi":"10.1109/ASAP.1995.522915","DOIUrl":"https://doi.org/10.1109/ASAP.1995.522915","url":null,"abstract":"Data compression methods are used to reduce the redundancy in data representation in order to decrease the data storage requirements and communication costs. In order to exploit the benefits of data compression to conserve internal processor storage and computation resources, it is desirable to perform operations on compressed data without decompressing it. We present hardware algorithms and VLSI implementation of a chip to search a compressed text with respect to keys or patterns in compressed form using Huffman-type tree-based codes.","PeriodicalId":354358,"journal":{"name":"Proceedings The International Conference on Application Specific Array Processors","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132484371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Input buffering requirements of a systolic array for the inverse discrete wavelet transform","authors":"R. Lang, A. Spray","doi":"10.1109/ASAP.1995.522920","DOIUrl":"https://doi.org/10.1109/ASAP.1995.522920","url":null,"abstract":"The Discrete Wavelet Transform (DWT) is a signal processing technique popularised by its results in data compression. Considerable work has been done in designing novel architectures to perform the DWT, including a systolic architecture designed by the authors, but little attention has been given to the inverse DWT which is needed in applications such as data compression for signal reconstruction. Despite the fact that the inverse DWT is computationally the reverse of the DWT, the hardware design for the architecture is not simply mirrored. Existing designs expect the architecture for the inverse DWT to be a simple follow-on step from the DWT design, however this is not the case. We present one such problem here, showing the FIFO buffering required on the input of the inverse architecture. We show how the size of this buffer can be calculated and compare it to a fixed data array implementation. This work is based on our systolic array design and is an integral part of the inverse DWT design we are working on for image and video compression.","PeriodicalId":354358,"journal":{"name":"Proceedings The International Conference on Application Specific Array Processors","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125478391","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}