[1990] Proceedings of the International Conference on Application Specific Array Processors最新文献

筛选
英文 中文
A design methodology for fixed-size systolic arrays 固定大小收缩阵列的设计方法
J. Bu, E. Deprettere, P. Dewilde
{"title":"A design methodology for fixed-size systolic arrays","authors":"J. Bu, E. Deprettere, P. Dewilde","doi":"10.1109/ASAP.1990.145495","DOIUrl":"https://doi.org/10.1109/ASAP.1990.145495","url":null,"abstract":"The authors present a methodology to design fixed-size systolic arrays. It allows a systematic and hierarchical mapping of full-size arrays to fixed-size arrays. Two processor-clustering techniques are described. They can be used to achieve the following design objectives: (1) transforming inefficient arrays into efficient arrays, (2) reducing the size of an array, (3) reducing the dimension of an array, and (4) balancing local memory and external communication of processors. A technique is described to cluster processors in such a way that the number of I/O pins of the resulting processor is independent of the number of processors that are clustered. The approach presented unifies and generalizes array reduction techniques.<<ETX>>","PeriodicalId":438078,"journal":{"name":"[1990] Proceedings of the International Conference on Application Specific Array Processors","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129521841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 67
A processor-time minimal systolic array for transitive closure 传递闭包的处理器时间最小收缩数组
C. Scheiman, P. Cappello
{"title":"A processor-time minimal systolic array for transitive closure","authors":"C. Scheiman, P. Cappello","doi":"10.1109/ASAP.1990.145439","DOIUrl":"https://doi.org/10.1109/ASAP.1990.145439","url":null,"abstract":"A directed acyclic graph (DAG) model of algorithms is used. For a given DAG the authors focus on processor-time minimal multiprocessor schedules: time minimal multiprocessor schedules that use as few processors as possible. The Kung, Lo and Lewis (KLL) algorithm (S.-Y. Kung et al., 1987) for computing the transitive closure of a relation over a set of n elements requires at least 5n-4 steps. Their systolic array comprises n/sup 2/ processing elements. Here, it first is shown that any multiprocessor that achieves this 5n-4 time bound needs at least (n/sup 2//3) processing elements. Then, a processor-time minimal systolic array realizing the KLL algorithm's DAG is constructed. Its (n/sup 2//3) processing elements are organized as a cylindrically connected 2-D mesh, when n identical to 0 mod 3. When n is not identical to 0 mod 3, the 2-D mesh is connected as a twisted torus.<<ETX>>","PeriodicalId":438078,"journal":{"name":"[1990] Proceedings of the International Conference on Application Specific Array Processors","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125454784","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
A 3-D wafer scale architecture for early vision processing 一种用于早期视觉处理的三维晶圆尺度架构
S. T. Toborg
{"title":"A 3-D wafer scale architecture for early vision processing","authors":"S. T. Toborg","doi":"10.1109/ASAP.1990.145462","DOIUrl":"https://doi.org/10.1109/ASAP.1990.145462","url":null,"abstract":"A massively parallel SIMD cellular computer is designed for processing early vision algorithms based on regularization theory and Markov random field (MRF) models. Algorithmic requirements and implementation issues are reviewed in detail for edge detection/surface reconstruction. The development of 3-D wafer scale integration (WSI) technologies that offer an ideal medium for implementing many early vision algorithms is discussed. An edge detection algorithm is mapped to the 3-D WSI computer that consists of a 128*128 array of processors formed by stacking 15 four inch CMOS wafers. This mapping is used as the basis for an enhanced array processor tailored for multiresolution MRF processing. Enhancements are proposed that would boost peak performance to over a trillion operations per second, using a stack of 40 wafers, with a total system volume of 820 cm/sup 3/ and consuming about 370 W.<<ETX>>","PeriodicalId":438078,"journal":{"name":"[1990] Proceedings of the International Conference on Application Specific Array Processors","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125557165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Embedding pyramids in array processors with pipelined busses 在带有流水线总线的数组处理器中嵌入金字塔
Zicheng Guo, R. Melhem
{"title":"Embedding pyramids in array processors with pipelined busses","authors":"Zicheng Guo, R. Melhem","doi":"10.1109/ASAP.1990.145501","DOIUrl":"https://doi.org/10.1109/ASAP.1990.145501","url":null,"abstract":"The concept of pipelined buses for parallel architectures diverges from the conventional exclusive access buses and offers both possibilities and challenges for significantly improving the efficiency of interprocessor communications in parallel computers. The authors present an efficient embedding of pyramids in array processors with pipelined buses. The embedding has the property that all the neighboring nodes in the pyramid are mapped to the same bus. Thus, any two neighbors in the embedded pyramid can communicate with each other using a single bus cycle.<<ETX>>","PeriodicalId":438078,"journal":{"name":"[1990] Proceedings of the International Conference on Application Specific Array Processors","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126226524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Towards the automated design of application specific array processors (ASAPs) 面向特定应用阵列处理器(ASAPs)的自动化设计
A. P. Marriott, A. Duller, R. Storer, A. Thomson, M. R. Pout
{"title":"Towards the automated design of application specific array processors (ASAPs)","authors":"A. P. Marriott, A. Duller, R. Storer, A. Thomson, M. R. Pout","doi":"10.1109/ASAP.1990.145477","DOIUrl":"https://doi.org/10.1109/ASAP.1990.145477","url":null,"abstract":"The authors describe the architecture and VLSI design of GLiTCH, an associative processor array chip designed for computer vision applications. The design is built from a library of cells, which can be used in conjunction with high level functional specifications to rapidly design new application specific array processors. The objective is to design a system which will allow application specific associative array processors (ASAPs) to be defined, simulated and then produced in silicon automatically from high level description data. Using such techniques should reduce the design cycle time to the point where processor arrays optimized for a particular problem could be fabricated. The authors describe some of the VLSI design which has been done towards achieving the automatic layout of ASAPs. Specifically, the design decisions and trade-offs made in the implementation of a test chip are described and applied to the problem of producing ASAPs.<<ETX>>","PeriodicalId":438078,"journal":{"name":"[1990] Proceedings of the International Conference on Application Specific Array Processors","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128017094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A graph-based approach to map matrix algorithms onto local-access processor arrays 将矩阵算法映射到本地访问处理器阵列的基于图的方法
J. Moreno, T. Lang
{"title":"A graph-based approach to map matrix algorithms onto local-access processor arrays","authors":"J. Moreno, T. Lang","doi":"10.1109/ASAP.1990.145499","DOIUrl":"https://doi.org/10.1109/ASAP.1990.145499","url":null,"abstract":"The authors describe the application of the multi-mesh graph (MMG) method to the mapping of large matrix algorithms onto class-specific local-access processor arrays. These arrays consist of cells with large local memory (i.e., memory size proportional to the size of the problems) and low cell bandwidth (much smaller than the cell computation rate). The results given indicate that the MMG method allows the analysis of such issues as allocation operations to cells, load balancing, scheduling, synchronization, and overhead in computations and data transfers. These aspects are illustrated by mapping the LU-decomposition algorithm onto a linear memory-linked array. Performance estimates indicate that mapping with the MMG method produces 94% utilization of cells in the target structure used. Therefore, the MMG is a suitable tool for mapping matrix algorithms onto pre-existing arrays.<<ETX>>","PeriodicalId":438078,"journal":{"name":"[1990] Proceedings of the International Conference on Application Specific Array Processors","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134576576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Domain flow and streaming architectures 领域流和流架构
E. T. L. Omtzigt
{"title":"Domain flow and streaming architectures","authors":"E. T. L. Omtzigt","doi":"10.1109/ASAP.1990.145479","DOIUrl":"https://doi.org/10.1109/ASAP.1990.145479","url":null,"abstract":"The author introduces the main ideas of a system compiler for affine dependence algorithm. The first idea is a streaming architecture, which is a machine model for the compiler that reduces control overhead in comparison with an ensemble of von Neumann architectures. Such a streaming architecture is a dedicated architecture programmed with an incremental array instruction to be able to run any instance of the problem. The second idea is the domain flow model, which is a program representation that captures the communication of the algorithm. The structure of the compiler reflects the division between synthesis and code generation. A general front-end generates a domain flow graph. Both synthesis and code generation phases work off this data structure. However, each phase has its own back-end. For the synthesis phase the back-end is a design critic combined with an expert system which makes decision about what to do next to satisfy the design goals. For the code generation phase the back-end iterates through different partitioning and code generation strategies.<<ETX>>","PeriodicalId":438078,"journal":{"name":"[1990] Proceedings of the International Conference on Application Specific Array Processors","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114393341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A fault-tolerant two-dimensional sorting network 一个容错的二维排序网络
J. Krammer, H. Arif
{"title":"A fault-tolerant two-dimensional sorting network","authors":"J. Krammer, H. Arif","doi":"10.1109/ASAP.1990.145469","DOIUrl":"https://doi.org/10.1109/ASAP.1990.145469","url":null,"abstract":"The authors evaluate a class of sorting algorithms which can be adapted to a faulty network with nearest neighbor interconnections by determining a suitable indexing scheme. A worst case sorting time of O(N) is proved for these sorters. Simulation results show that the average sorting time of the fault-tolerant sorters is only slightly higher than O( square root N), and therefore is comparable to that of non-fault-tolerant sorting algorithms. This algorithmic approach does not require additional wiring for reconfiguration, and hence the amount of additional circuitry required for fault-tolerance is very small. An efficient procedure for calculating an indexing scheme is presented and simulation results are shown. Furthermore, an efficient strategy for testing the network is proposed.<<ETX>>","PeriodicalId":438078,"journal":{"name":"[1990] Proceedings of the International Conference on Application Specific Array Processors","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114401002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Designing specific systolic arrays with the API15C chip 用API15C芯片设计特定的收缩阵列
P. Frison, E. Gautrin, D. Lavenier, Jean-Luc Scharbarg
{"title":"Designing specific systolic arrays with the API15C chip","authors":"P. Frison, E. Gautrin, D. Lavenier, Jean-Luc Scharbarg","doi":"10.1109/ASAP.1990.145486","DOIUrl":"https://doi.org/10.1109/ASAP.1990.145486","url":null,"abstract":"The API15C processor, a building block for different systolic structures, is designed exclusively for single-instruction-multiple data (SIMD) execution mode. To support this mode, the instruction set includes special control instructions. Three parallel I/O ports are available for different interconnection schemes. The API15C chip is designed in a CMOS 2- mu m technology. It contains 45000 transistors on a 6-mm $M6.2-mm silicon area. The functionality of the circuit was tested successfully after the first run. It executes one instruction per clock phase of 100 ns, giving a global rate of 10 MIPS. To validate this processing element as a building block for systolic structures, a programmable interface and two single board machines were developed. The first is an 18 processor linear structure able to support a wide range of applications. The second is a 28 processor bidimensional structure for a specific application of string comparison. The instruction set is particularly well-suited for SIMD operation.<<ETX>>","PeriodicalId":438078,"journal":{"name":"[1990] Proceedings of the International Conference on Application Specific Array Processors","volume":"158 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116119726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Two-level pipelined implementation of systolic block Householder transformation with application to RLS algorithm 收缩块Householder变换的两级流水线实现及其在RLS算法中的应用
K. J. Liu, S. Hsieh, K. Yao
{"title":"Two-level pipelined implementation of systolic block Householder transformation with application to RLS algorithm","authors":"K. J. Liu, S. Hsieh, K. Yao","doi":"10.1109/ASAP.1990.145510","DOIUrl":"https://doi.org/10.1109/ASAP.1990.145510","url":null,"abstract":"The authors propose a systolic block Householder transformation (SBHT) approach to implement the Householder transformation (HT) on a systolic array as well as its application to the recursive-least-squares (RLS) algorithm. Since the data are fetched in a block manner, vector operations are in general required for the vectorized array. However, by using a modified HT algorithm, a two-level pipelined implementation can be used to pipeline the SBHT systolic array both at the vector and word levels. The throughput can be as fast as that of the Givens rotation method. The approach makes the HT amenable for VLSI implementation as well as applicable to real-time high throughput applications of modern signal processing. The constrained RLS problem using the SBHT RLS systolic array is also considered.<<ETX>>","PeriodicalId":438078,"journal":{"name":"[1990] Proceedings of the International Conference on Application Specific Array Processors","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129874672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信