R. Barzic, C. Bouville, François Charot, Gwendal Le Fol, P. Lemonnier, Charles Wagner
{"title":"MOVIE: a building block for the design of real time simulator of moving pictures compression algorithms","authors":"R. Barzic, C. Bouville, François Charot, Gwendal Le Fol, P. Lemonnier, Charles Wagner","doi":"10.1109/ASAP.1995.522923","DOIUrl":"https://doi.org/10.1109/ASAP.1995.522923","url":null,"abstract":"This paper shows how a real-time simulator of moving pictures compression algorithms can be rapidly assembled using a basic building block, here called MOVIE (MOdule for Video Experimentation). The internal architecture of the MOVIE VLSI chip can be compared to a small systolic machine made of a 32-bit I/O processor, a reduced linear array of 16-bit computation processors and data video input/output mechanisms. Externally, the chip is provided with four 16-bit bidirectional data ports and three 16-bit bidirectional data video port. Several MOVIE chips can be easily clustered to allow the size of the linear array of computation processors to be increased. The MOVIE chip is fully programmable in a high level language in order to make program developments easier.","PeriodicalId":354358,"journal":{"name":"Proceedings The International Conference on Application Specific Array Processors","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126092904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Interfacing FPGA/VLSI processor arrays","authors":"Joseph A. Fernando, Jack S. N. Jean","doi":"10.1109/ASAP.1995.522927","DOIUrl":"https://doi.org/10.1109/ASAP.1995.522927","url":null,"abstract":"Mapping DSP algorithms to FPGA/VLSI circuits is an important issue in Application-Specific Array Processor design. Since a DSP algorithm can be abstracted as a graph where each node is a shift-invariant DG (Dependence Graph) and the edges denote the data flow, it is possible to map a DSP algorithm to a set of processor arrays with some interface circuits. The interface design depends on the projection/scheduling vectors used on the two corresponding shift-invariant DGs and the interfacing cost is very significant when a lot of delays are necessary or when the processor operations are relatively inexpensive in terms of area. Therefore, when selecting these vectors in a design environment, the effect on the interface circuit must be accurately computed. In this paper, various interface circuit designs are presented and categorized based on the data conversion requirement. An algorithm to select a design from many design options to minimize the cost is also described.","PeriodicalId":354358,"journal":{"name":"Proceedings The International Conference on Application Specific Array Processors","volume":"104 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121434469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Revisiting the decomposition of Karp, Miller and Winograd","authors":"A. Darte, F. Vivien","doi":"10.1109/ASAP.1995.522901","DOIUrl":"https://doi.org/10.1109/ASAP.1995.522901","url":null,"abstract":"This paper is devoted to the construction of multi-dimensional schedules for a system of uniform recurrence equations. We show that this problem is dual to the problem of computability of a system of uniform recurrence equations. We propose a new study of the decomposition algorithm first proposed by Karp, Miller and Winograd: we base our implementation on linear programming resolutions whose duals give exactly the desired multi-dimensional schedules. Furthermore, we prove that the schedules built this way are optimal up to a constant factor.","PeriodicalId":354358,"journal":{"name":"Proceedings The International Conference on Application Specific Array Processors","volume":"146 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131788762","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
V. Bokka, H. Gurla, S. Olariu, J. Schwing, L. Wilson
{"title":"Time-optimal ranking algorithms on sorted matrices","authors":"V. Bokka, H. Gurla, S. Olariu, J. Schwing, L. Wilson","doi":"10.1109/ASAP.1995.522904","DOIUrl":"https://doi.org/10.1109/ASAP.1995.522904","url":null,"abstract":"Answering rank queries is a recurring operation in various application domains including geographic data processing, information retrieval, database design, information management, and medical image processing. Many of these applications involve data stored in a matrix satisfying a number of properties. One property that occurs time and again in applications specifies that the rows and the columns of the matrix are independently sorted. It is customary to refer to such a matrix as sorted. An instance of the Batched Ranking problem, (BR, for short) involves a sorted matrix A of items from a totally ordered universe, along with a collection Q of queries of the following type: for a query q/sub j/ one is interested in the number of items in A that are smaller than q/sub j/. The BR problem asks for solving all queries in Q. In this work, we consider the BR problem in the following context: the matrix A is pretiled, one item per processor, onto an enhanced mesh of size /spl radic/n/spl times//spl radic/n; the m queries are stored, one per processor, in the first m//spl radic/n columns of the platform. Our main contribution is twofold. First, we show that any algorithm that solves the BR problem must take at least /spl Omega/(log n+/spl radic/m) time in the worst case. Second, we show that this time lower bound is tight on meshes of size /spl radic/n/spl times//spl radic/n enhanced with multiple broadcasting, by exhibiting an algorithm solving the BR problem in O(log n+/spl radic/m) time on such a platform.","PeriodicalId":354358,"journal":{"name":"Proceedings The International Conference on Application Specific Array Processors","volume":"2010 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124990882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multilayer cellular algorithm for complex number multiplication","authors":"V. Markova","doi":"10.1109/ASAP.1995.522933","DOIUrl":"https://doi.org/10.1109/ASAP.1995.522933","url":null,"abstract":"A new multilayer cellular algorithm for complex number multiplication is presented. The upper estimate of the time complexity is obtained. The design is based on an original model of distributed computation which is called Parallel Substitution Algorithm.","PeriodicalId":354358,"journal":{"name":"Proceedings The International Conference on Application Specific Array Processors","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115261330","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Implementation of parallel arithmetic in a cellular automaton","authors":"R. Squier, K. Steiglitz, Mariusz H. Jakubowski","doi":"10.1109/ASAP.1995.522928","DOIUrl":"https://doi.org/10.1109/ASAP.1995.522928","url":null,"abstract":"We describe an approach to parallel computation using particle propagation and collisions in a one-dimensional cellular automaton using a Particle model-a Particle Machine (PM). Such a machine has the parallelism, structural regularity, and local connectivity of systolic arrays, but is general and programmable. It contains no explicit multipliers, adders, or other fixed arithmetic operations; these are implemented using fine-grain interactions of logical particles which are injected into the medium of the cellular automaton, and which represent both data and processors. We give parallel, linear-time implementations of addition, subtraction, multiplication and division.","PeriodicalId":354358,"journal":{"name":"Proceedings The International Conference on Application Specific Array Processors","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125321255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Precise tiling for uniform loop nests","authors":"P. Calland, T. Risset","doi":"10.1109/ASAP.1995.522937","DOIUrl":"https://doi.org/10.1109/ASAP.1995.522937","url":null,"abstract":"The subject of this article is a hyperplane partitioning problem applied to perfect loop nests. This work is aimed at increasing the computation granularity to reduce the overhead due to communication. This study is different from previous work as it takes redundant communication into account. We propose an algorithm giving the optimal solution and various examples to show the validity of this report.","PeriodicalId":354358,"journal":{"name":"Proceedings The International Conference on Application Specific Array Processors","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117062734","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
W. Luo, G. Jullien, N. Wigley, W. Miller, Zhongde Wang
{"title":"An array processor for inner product computations using a Fermat number ALU","authors":"W. Luo, G. Jullien, N. Wigley, W. Miller, Zhongde Wang","doi":"10.1109/ASAP.1995.522931","DOIUrl":"https://doi.org/10.1109/ASAP.1995.522931","url":null,"abstract":"This paper explores an architecture for parallel independent computations of inner products over the direct product ring /spl Rfr//sub 257/spl times/17/. The structure is based on the polynomial mapping of the Modulus Replication RNS for calculations over dynamic ranges much larger than the product of the computational moduli. We show that the computational ring is optimal for our purposes, and introduce basic cells for the efficient calculation of all elements of the polynomial ring computations.","PeriodicalId":354358,"journal":{"name":"Proceedings The International Conference on Application Specific Array Processors","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115804693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A processor-time-minimal schedule for 3D rectilinear mesh algorithms","authors":"C. Scheiman, P. Cappello","doi":"10.1109/ASAP.1995.522902","DOIUrl":"https://doi.org/10.1109/ASAP.1995.522902","url":null,"abstract":"The paper, using a directed acyclic graph (dag) model of algorithms, investigates precedence constrained multiprocessor schedules for the n/sub x//spl times/n/sub y//spl times/n/sub z/ directed rectilinear mesh. Its completion requires at least n/sub x/+n/sub y/+n/sub z/-2 multiprocessor steps. Time-minimal multiprocessor schedules that use as few processors as possible are called processor-time-minimal. Lower bounds are shown for the n/sub x//spl times/n/sub y//spl times/n/sub z/ directed mesh, and these bounds are shown to be exact by constructing a processor-time-minimal multiprocessor schedule that can be realized on a systolic array whose topology is either a two dimensional mesh or skewed cylinder. The contribution of this paper is two-fold: It generalizes the previous work on cubical mesh algorithms, and it presents a more elegant mathematical method for deriving processor-time lower bounds for such problems.","PeriodicalId":354358,"journal":{"name":"Proceedings The International Conference on Application Specific Array Processors","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133653401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Techniques for yield enhancement of VLSI adders","authors":"Zhan Chen, I. Koren","doi":"10.1109/ASAP.1995.522926","DOIUrl":"https://doi.org/10.1109/ASAP.1995.522926","url":null,"abstract":"For VLSI application-specific arrays and other regular VLSI circuits, two techniques are available for yield enhancement, namely defect-tolerance and layout modifications. In this paper, we compare these two yield enhancement approaches by using adders as an example. Our yield projections indicate that the layout modification technique is more efficient when the defect density is low, while reconfiguration is more efficient for a high defect density. However, from the point of the view of effective yield, the layout modification is superior to defect tolerance in the practical range of defect density.","PeriodicalId":354358,"journal":{"name":"Proceedings The International Conference on Application Specific Array Processors","volume":"25 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114046116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}