{"title":"A solid translation engine using ray representation","authors":"T. Alexander, J. L. Ellis, Gershon Kedem","doi":"10.1109/ASAP.1995.522919","DOIUrl":"https://doi.org/10.1109/ASAP.1995.522919","url":null,"abstract":"We describe an extension to the geometric domain of solid modeling to include solids defined by spatial sweeping and Minkowski sums. We develop an efficient, parallel algorithm for the translation of such solid models. An architecture and design of an array processor that implements this algorithm are presented. We discuss some applications of the new computer to solid modeling an CAD/CAM and modeling of large biomolecules (proteins) for rational drug design.","PeriodicalId":354358,"journal":{"name":"Proceedings The International Conference on Application Specific Array Processors","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114531189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Interfacing FPGA/VLSI processor arrays","authors":"Joseph A. Fernando, Jack S. N. Jean","doi":"10.1109/ASAP.1995.522927","DOIUrl":"https://doi.org/10.1109/ASAP.1995.522927","url":null,"abstract":"Mapping DSP algorithms to FPGA/VLSI circuits is an important issue in Application-Specific Array Processor design. Since a DSP algorithm can be abstracted as a graph where each node is a shift-invariant DG (Dependence Graph) and the edges denote the data flow, it is possible to map a DSP algorithm to a set of processor arrays with some interface circuits. The interface design depends on the projection/scheduling vectors used on the two corresponding shift-invariant DGs and the interfacing cost is very significant when a lot of delays are necessary or when the processor operations are relatively inexpensive in terms of area. Therefore, when selecting these vectors in a design environment, the effect on the interface circuit must be accurately computed. In this paper, various interface circuit designs are presented and categorized based on the data conversion requirement. An algorithm to select a design from many design options to minimize the cost is also described.","PeriodicalId":354358,"journal":{"name":"Proceedings The International Conference on Application Specific Array Processors","volume":"104 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121434469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Revisiting the decomposition of Karp, Miller and Winograd","authors":"A. Darte, F. Vivien","doi":"10.1109/ASAP.1995.522901","DOIUrl":"https://doi.org/10.1109/ASAP.1995.522901","url":null,"abstract":"This paper is devoted to the construction of multi-dimensional schedules for a system of uniform recurrence equations. We show that this problem is dual to the problem of computability of a system of uniform recurrence equations. We propose a new study of the decomposition algorithm first proposed by Karp, Miller and Winograd: we base our implementation on linear programming resolutions whose duals give exactly the desired multi-dimensional schedules. Furthermore, we prove that the schedules built this way are optimal up to a constant factor.","PeriodicalId":354358,"journal":{"name":"Proceedings The International Conference on Application Specific Array Processors","volume":"146 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131788762","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
V. Bokka, H. Gurla, S. Olariu, J. Schwing, L. Wilson
{"title":"Time-optimal ranking algorithms on sorted matrices","authors":"V. Bokka, H. Gurla, S. Olariu, J. Schwing, L. Wilson","doi":"10.1109/ASAP.1995.522904","DOIUrl":"https://doi.org/10.1109/ASAP.1995.522904","url":null,"abstract":"Answering rank queries is a recurring operation in various application domains including geographic data processing, information retrieval, database design, information management, and medical image processing. Many of these applications involve data stored in a matrix satisfying a number of properties. One property that occurs time and again in applications specifies that the rows and the columns of the matrix are independently sorted. It is customary to refer to such a matrix as sorted. An instance of the Batched Ranking problem, (BR, for short) involves a sorted matrix A of items from a totally ordered universe, along with a collection Q of queries of the following type: for a query q/sub j/ one is interested in the number of items in A that are smaller than q/sub j/. The BR problem asks for solving all queries in Q. In this work, we consider the BR problem in the following context: the matrix A is pretiled, one item per processor, onto an enhanced mesh of size /spl radic/n/spl times//spl radic/n; the m queries are stored, one per processor, in the first m//spl radic/n columns of the platform. Our main contribution is twofold. First, we show that any algorithm that solves the BR problem must take at least /spl Omega/(log n+/spl radic/m) time in the worst case. Second, we show that this time lower bound is tight on meshes of size /spl radic/n/spl times//spl radic/n enhanced with multiple broadcasting, by exhibiting an algorithm solving the BR problem in O(log n+/spl radic/m) time on such a platform.","PeriodicalId":354358,"journal":{"name":"Proceedings The International Conference on Application Specific Array Processors","volume":"2010 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124990882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multilayer cellular algorithm for complex number multiplication","authors":"V. Markova","doi":"10.1109/ASAP.1995.522933","DOIUrl":"https://doi.org/10.1109/ASAP.1995.522933","url":null,"abstract":"A new multilayer cellular algorithm for complex number multiplication is presented. The upper estimate of the time complexity is obtained. The design is based on an original model of distributed computation which is called Parallel Substitution Algorithm.","PeriodicalId":354358,"journal":{"name":"Proceedings The International Conference on Application Specific Array Processors","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115261330","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Implementation of parallel arithmetic in a cellular automaton","authors":"R. Squier, K. Steiglitz, Mariusz H. Jakubowski","doi":"10.1109/ASAP.1995.522928","DOIUrl":"https://doi.org/10.1109/ASAP.1995.522928","url":null,"abstract":"We describe an approach to parallel computation using particle propagation and collisions in a one-dimensional cellular automaton using a Particle model-a Particle Machine (PM). Such a machine has the parallelism, structural regularity, and local connectivity of systolic arrays, but is general and programmable. It contains no explicit multipliers, adders, or other fixed arithmetic operations; these are implemented using fine-grain interactions of logical particles which are injected into the medium of the cellular automaton, and which represent both data and processors. We give parallel, linear-time implementations of addition, subtraction, multiplication and division.","PeriodicalId":354358,"journal":{"name":"Proceedings The International Conference on Application Specific Array Processors","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125321255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Precise tiling for uniform loop nests","authors":"P. Calland, T. Risset","doi":"10.1109/ASAP.1995.522937","DOIUrl":"https://doi.org/10.1109/ASAP.1995.522937","url":null,"abstract":"The subject of this article is a hyperplane partitioning problem applied to perfect loop nests. This work is aimed at increasing the computation granularity to reduce the overhead due to communication. This study is different from previous work as it takes redundant communication into account. We propose an algorithm giving the optimal solution and various examples to show the validity of this report.","PeriodicalId":354358,"journal":{"name":"Proceedings The International Conference on Application Specific Array Processors","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117062734","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
W. Luo, G. Jullien, N. Wigley, W. Miller, Zhongde Wang
{"title":"An array processor for inner product computations using a Fermat number ALU","authors":"W. Luo, G. Jullien, N. Wigley, W. Miller, Zhongde Wang","doi":"10.1109/ASAP.1995.522931","DOIUrl":"https://doi.org/10.1109/ASAP.1995.522931","url":null,"abstract":"This paper explores an architecture for parallel independent computations of inner products over the direct product ring /spl Rfr//sub 257/spl times/17/. The structure is based on the polynomial mapping of the Modulus Replication RNS for calculations over dynamic ranges much larger than the product of the computational moduli. We show that the computational ring is optimal for our purposes, and introduce basic cells for the efficient calculation of all elements of the polynomial ring computations.","PeriodicalId":354358,"journal":{"name":"Proceedings The International Conference on Application Specific Array Processors","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115804693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A processor-time-minimal schedule for 3D rectilinear mesh algorithms","authors":"C. Scheiman, P. Cappello","doi":"10.1109/ASAP.1995.522902","DOIUrl":"https://doi.org/10.1109/ASAP.1995.522902","url":null,"abstract":"The paper, using a directed acyclic graph (dag) model of algorithms, investigates precedence constrained multiprocessor schedules for the n/sub x//spl times/n/sub y//spl times/n/sub z/ directed rectilinear mesh. Its completion requires at least n/sub x/+n/sub y/+n/sub z/-2 multiprocessor steps. Time-minimal multiprocessor schedules that use as few processors as possible are called processor-time-minimal. Lower bounds are shown for the n/sub x//spl times/n/sub y//spl times/n/sub z/ directed mesh, and these bounds are shown to be exact by constructing a processor-time-minimal multiprocessor schedule that can be realized on a systolic array whose topology is either a two dimensional mesh or skewed cylinder. The contribution of this paper is two-fold: It generalizes the previous work on cubical mesh algorithms, and it presents a more elegant mathematical method for deriving processor-time lower bounds for such problems.","PeriodicalId":354358,"journal":{"name":"Proceedings The International Conference on Application Specific Array Processors","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133653401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Synthesis of multirate VLSI arrays","authors":"P. Lenders, S. Rajopadhye","doi":"10.1109/ASAP.1995.523640","DOIUrl":"https://doi.org/10.1109/ASAP.1995.523640","url":null,"abstract":"Many applications in signal and image processing can be implemented on regular VLSI architectures such as systolic arrays. Multirate arrays, or MRAs are an extension of systolic arrays where different data streams propagate with different clocks. It is known that they can be modelled as systems of uniform recurrence equations over sparse polyhedral domains. Using well known linear index transformation rules for systems of affine recurrence equations, or SAREs, we show that MRAs constitute a particular proper subset of SAREs. We describe how an MRA can be systematically derived from an initial specification in the form of a mathematical equation. The main transformation that we use is dependency decomposition, and rue illustrate our method by deriving a hitherto unknown decimation filter array that improves upon the hardware cost of previously published filters.","PeriodicalId":354358,"journal":{"name":"Proceedings The International Conference on Application Specific Array Processors","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114761118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}