[1990] Proceedings of the International Conference on Application Specific Array Processors最新文献

筛选
英文 中文
Bit-level systolic algorithm for the symmetric eigenvalue problem 对称特征值问题的位级收缩算法
J. Delosme
{"title":"Bit-level systolic algorithm for the symmetric eigenvalue problem","authors":"J. Delosme","doi":"10.1109/ASAP.1990.145511","DOIUrl":"https://doi.org/10.1109/ASAP.1990.145511","url":null,"abstract":"An arithmetic algorithm is presented which speeds up the parallel Jacobi method for the eigen-decomposition of real symmetric matrices. After analyzing the elementary mathematical operations in the Jacobi method (i.e. the evaluation and application of Jacobi rotations), the author devises arithmetic algorithms that effect these mathematical operations with few primitive operations (i.e. few shifts and adds) and enable the most efficient use of the parallel hardware. The matrices to which the plane Jacobi rotations are applied are decomposed into even and odd parts, enabling the application of the rotations from a single side and thus removing some sequentiality from the original method. The rotations are evaluated and applied in a fully concurrent fashion with the help of an implicit CORDIC algorithm. In addition, the CORDIC algorithm can perform rotations with variable resolution, which lead to a significant reduction in the total computation time.<<ETX>>","PeriodicalId":438078,"journal":{"name":"[1990] Proceedings of the International Conference on Application Specific Array Processors","volume":"496 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116199896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Reconfiguration of FFT arrays: a flow-driven approach FFT阵列的重构:一个流驱动的方法
A. Antola, N. Scarabottolo
{"title":"Reconfiguration of FFT arrays: a flow-driven approach","authors":"A. Antola, N. Scarabottolo","doi":"10.1109/ASAP.1990.145476","DOIUrl":"https://doi.org/10.1109/ASAP.1990.145476","url":null,"abstract":"A new reconfiguration algorithm for defect and fault tolerance in fast Fourier transform (FFT) two-dimensional arrays is presented. The reconfiguration scheme is based on the data flow of the algorithm to minimize the overhead due to the re-routing of information in the reconfigured array. Evaluation of the effectiveness of this approach shows a significant increase in system robustness with respect to other, non-dedicated reconfiguration approaches. Moreover, the possibility of choosing between two reconfiguration algorithms characterized by different complexities and efficiencies results in both an optimal, host-driven reconfiguration (particularly suited for end-of-production yield enhancement) and a fast, self-performed reconfiguration (suited for on-line reliability enhancement).<<ETX>>","PeriodicalId":438078,"journal":{"name":"[1990] Proceedings of the International Conference on Application Specific Array Processors","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129766511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Digit-serial VLSI microarchitecture 数字串行VLSI微架构
S. Smith, J. Payne, R. Morgan
{"title":"Digit-serial VLSI microarchitecture","authors":"S. Smith, J. Payne, R. Morgan","doi":"10.1109/ASAP.1990.145482","DOIUrl":"https://doi.org/10.1109/ASAP.1990.145482","url":null,"abstract":"The authors illustrate the techniques by which a simple function library may be widely parameterized to meet the diverse function, throughput and accuracy requirements in high-performance integer arithmetic applications. In a design automation environment the user's view of these structures is, in the case of multipliers and adders, a simple functional icon carrying synthetic parameters which are derived from global throughput and accuracy requirements. Shifters are included automatically for consistency, allowing usage of the specified numerical resources to be maximized for any application. Processors of throughputs approaching one billion operations/sec may be easily assembled using these techniques, figures which are difficult to achieve in conventional architectures. The full power of parallelism and pipelining is brought to bear on computational problems, the price paid being the loss of inherent programmability.<<ETX>>","PeriodicalId":438078,"journal":{"name":"[1990] Proceedings of the International Conference on Application Specific Array Processors","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128581555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A formal design methodology for parallel architectures 并行体系结构的正式设计方法
K. Elleithy, M. Bayoumi
{"title":"A formal design methodology for parallel architectures","authors":"K. Elleithy, M. Bayoumi","doi":"10.1109/ASAP.1990.145496","DOIUrl":"https://doi.org/10.1109/ASAP.1990.145496","url":null,"abstract":"The authors introduce a formal approach for synthesis of array architectures. The methodology provides two main features: completeness and correctness. Completeness means the ability to use the approach for any general algorithm. Correctness is achieved by using a set of transformations that are proved to be correct. Four different forms are used to express the input algorithm: simultaneous recursion, recursion with respect to different variables, fixed nesting, and variable nesting. Four different architectures for the same algorithm are obtained. As an example, a matrix-matrix multiplication algorithm is used to obtain four different optimal architectures. The different architectures of this example are compared in terms of area, time, broadcasting, and required hardware.<<ETX>>","PeriodicalId":438078,"journal":{"name":"[1990] Proceedings of the International Conference on Application Specific Array Processors","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115974670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Spacetime-minimal systolic architectures for Gaussian elimination and the algebraic path problem 高斯消去的时空最小收缩结构与代数路径问题
A. Benaini, Y. Robert
{"title":"Spacetime-minimal systolic architectures for Gaussian elimination and the algebraic path problem","authors":"A. Benaini, Y. Robert","doi":"10.1109/ASAP.1990.145509","DOIUrl":"https://doi.org/10.1109/ASAP.1990.145509","url":null,"abstract":"The authors have designed two systolic arrays that are both time-minimal and space-minimal for Gaussian elimination and the algebraic path problem (APP), thereby establishing the systolic complexity of these two computational kernels. The systolic computation is modeled by a directed acyclic graph (DAG) with nodes corresponding to computed values and arcs denoting dependencies. The computation DAG is taken to be fixed and given. The time to compute a DAG is determined when a timing function is assigned, or scheduled, to the nodes, subject to the constraints that a node can be computed only when its predecessors (the nodes which it depends upon) have been computed at previous steps, and no processor can compute two different nodes at the same time step. For a problem of size n, the authors obtain an execution time (T(n))=3n-1 using A(n)=n/sup 2//4+O(n) processors for Gaussian elimination, and T(n)=5n-2 and A(n)=n/sup 3//3+O(n) for the APP.<<ETX>>","PeriodicalId":438078,"journal":{"name":"[1990] Proceedings of the International Conference on Application Specific Array Processors","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115332469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
The design of a high-performance scalable architecture for image processing applications 为图像处理应用设计一个高性能可扩展架构
C. T. Gray, Wentai Liu, T. Hughes, R. Cavin
{"title":"The design of a high-performance scalable architecture for image processing applications","authors":"C. T. Gray, Wentai Liu, T. Hughes, R. Cavin","doi":"10.1109/ASAP.1990.145506","DOIUrl":"https://doi.org/10.1109/ASAP.1990.145506","url":null,"abstract":"The authors present the organization of an interleaved wrap-around memory system for a partitionable parallel/pipeline architecture with P pipes of L processors each. The architecture is designed to efficiently support real-time image processing and computer vision algorithms, especially those requiring global data operations. The interleaved memory system makes the architecture highly scalable in that L and P can be chosen to optimize performance for particular problems and reconfigurable in that, once L and P are fixed, problems of any size can still be mapped onto the architecture. The authors demonstrate techniques and methods for mapping computational structures to the architecture by considering the case of the 1-D butterfly network (1DBN). Since many other computational structures can be mapped to 1DBN, this gives a firm application base for the architecture. The authors also demonstrate methods for scheduling and controlling the memory system.<<ETX>>","PeriodicalId":438078,"journal":{"name":"[1990] Proceedings of the International Conference on Application Specific Array Processors","volume":"42 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124386922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Byte-serial convolvers Byte-serial卷积器
L. Dadda
{"title":"Byte-serial convolvers","authors":"L. Dadda","doi":"10.1109/ASAP.1990.145489","DOIUrl":"https://doi.org/10.1109/ASAP.1990.145489","url":null,"abstract":"It is shown that previously proposed bit-serial convolver schemes (with weights in parallel form), working with zero separation between samples, can be transformed into byte-serial input schemes with a comparable clock rate, thus affording an increase in sampling rate equal to the number of bits in each byte. This is achieved by adopting a modified carry save circuit. The proposed schemes are based on a modified version of serial-parallel multipliers and on the use of pre-computed multiples of the weights. The case of 2-bit bytes is fully developed. It is shown that the use of samples represented in a biased binary number system leads to schemes that are only slightly more complex than the corresponding bit-serial schemes. The bit rate is determined by the delays of a full adder and a flip-flop. The schemes are composed by a number of bit-slices and appear to be easily partitionable in identical cascaded modules suitable for a fault tolerant architecture and a WSI implementation.<<ETX>>","PeriodicalId":438078,"journal":{"name":"[1990] Proceedings of the International Conference on Application Specific Array Processors","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123512895","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A real-time software programmable processor for HDTV and stereo scope signals 用于高清电视和立体声瞄准镜信号的实时软件可编程处理器
T. Nishitani, I. Tamitani, H. Harasaki, M. Yano
{"title":"A real-time software programmable processor for HDTV and stereo scope signals","authors":"T. Nishitani, I. Tamitani, H. Harasaki, M. Yano","doi":"10.1109/ASAP.1990.145459","DOIUrl":"https://doi.org/10.1109/ASAP.1990.145459","url":null,"abstract":"The architecture is an expanded version of a previously reported video signal processor in which a number of parallel processor clusters can be combined in a tandem connection form or in a parallel connection form. The new video signal processor introduces programmable time-expansion and time-compression circuits to A-to-D and D-to-A converters, respectively, for coping with high speed HDTV signals. It also employs input/output switch units before and after parallel processor clusters. The introduction of input/output switch units to the parallel processor clusters makes it possible to input several video signals simultaneously. By these additional units, a HDTV signal is converted to a set of NTSC level video signals in the time-expansion circuit. Every NTSC level video signal is then delivered to parallel processor clusters through an input switch unit. After processing in clusters, NTSC level signals are converted to a HDTV signal through an output switch unit and time-compression circuits. This architecture can be applied to stereo scope processing.<<ETX>>","PeriodicalId":438078,"journal":{"name":"[1990] Proceedings of the International Conference on Application Specific Array Processors","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123559305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Mapping algorithms onto the TUT cellular array processor 映射算法到TUT蜂窝式阵列处理器
J. Viitanen, T. Korpiharju, J. Takala, H. Kiminkinen
{"title":"Mapping algorithms onto the TUT cellular array processor","authors":"J. Viitanen, T. Korpiharju, J. Takala, H. Kiminkinen","doi":"10.1109/ASAP.1990.145461","DOIUrl":"https://doi.org/10.1109/ASAP.1990.145461","url":null,"abstract":"The Tampere University of Technology Cellular Array (TUTCA) processor array is based on a dynamically configurable logic cell array. It is intended for efficient implementation of the direct mapping dataflow principle with a self-timed, distributed control structure. The architecture of the processor, principles of mapping algorithms on it, and the compiler of the dataflow language are described. The language used for programming is a slightly modified version of DFL. The main features of DFL, the parser, the array processing, the graph structure generated by DFL, and the performance and exploitation of parallelism are considered.<<ETX>>","PeriodicalId":438078,"journal":{"name":"[1990] Proceedings of the International Conference on Application Specific Array Processors","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129379024","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
GRAPE: a special-purpose computer for N-body problems 用于n体问题的专用计算机
J. Makino, T. Ito, T. Ebisuzaki, D. Sugimoto
{"title":"GRAPE: a special-purpose computer for N-body problems","authors":"J. Makino, T. Ito, T. Ebisuzaki, D. Sugimoto","doi":"10.1109/ASAP.1990.145455","DOIUrl":"https://doi.org/10.1109/ASAP.1990.145455","url":null,"abstract":"GRAPE (GRAvity PipE) is a special-purpose computer designed to accelerate the numerical integration of the astrophysical N-body problem. The prototype hardware, GRAPE-1, is designed as the backend processor that calculates the gravitational interaction between particles. All other calculations are performed on the host computer connected to GRAPE-1. For large-N calculations (N>or approximately=10/sup 4/), GRAPE-1 achieves about 200 Mflops equivalent in one board of the size of about 40 cm by 30 cm, consuming 2.5 W of power. The specialized pipelined architecture of the GRAPE-1 optimized for the large N calculation is the key to the high performance. The authors describe the design, construction and programming of GRAPE-1. The architecture is quite simple, and it is easy to put one pipeline into one LSI chip and make many pipelines work in parallel, without creating a communication bottleneck.<<ETX>>","PeriodicalId":438078,"journal":{"name":"[1990] Proceedings of the International Conference on Application Specific Array Processors","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128200945","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信