Design of a Processor Element for a High Performance Massively Parallel SIMD System

Int. J. High Speed Comput. Pub Date : 1995-09-01 DOI:10.1142/S0129053395000208

D. Beal, C. Lambrinoudakis

{"title":"Design of a Processor Element for a High Performance Massively Parallel SIMD System","authors":"D. Beal, C. Lambrinoudakis","doi":"10.1142/S0129053395000208","DOIUrl":null,"url":null,"abstract":"This paper describes the architecture of the General Purpose with Floating Point support (GPFP) processing element, which uses the expansion of circuitry from VLSI advances to provide on-chip memory and cost-effective extra functionality. A major goal was to accelerate floating point arithmetic. This was combined with architectural aims of cost-effectiveness, achieving the floating-point capability from general-purpose units, and retaining the 1-bit manipulations available in the earlier generation. With a 50 MHz clock each PE is capable of 2.5 MegaFlops. Normalized to the same clock rate, the GPFP PE exceeds first generation PEs by far, namely the DAP by a factor of 50 and the MPP by a factor of 20, and also outperforms the recent MasPar design by a factor of four. A 32×32 GPFP array is capable of up to 2.5 GigaFlops and 6500 MIPS, on 32-bit additions. These speedups are obtained by architectural features rather than increased width of data-handling and are combined with parsimonious use of circuitry compatible with massively parallel fabrication. The GPFP also incorporates Reconfigurable Local Control (RLC), a technique that combines a considerable degree of local autonomy within PEs and microcode flexibility, giving the machine improved general-purpose programmability in addition to floating-point numerical performance.","PeriodicalId":270006,"journal":{"name":"Int. J. High Speed Comput.","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1995-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. J. High Speed Comput.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1142/S0129053395000208","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

This paper describes the architecture of the General Purpose with Floating Point support (GPFP) processing element, which uses the expansion of circuitry from VLSI advances to provide on-chip memory and cost-effective extra functionality. A major goal was to accelerate floating point arithmetic. This was combined with architectural aims of cost-effectiveness, achieving the floating-point capability from general-purpose units, and retaining the 1-bit manipulations available in the earlier generation. With a 50 MHz clock each PE is capable of 2.5 MegaFlops. Normalized to the same clock rate, the GPFP PE exceeds first generation PEs by far, namely the DAP by a factor of 50 and the MPP by a factor of 20, and also outperforms the recent MasPar design by a factor of four. A 32×32 GPFP array is capable of up to 2.5 GigaFlops and 6500 MIPS, on 32-bit additions. These speedups are obtained by architectural features rather than increased width of data-handling and are combined with parsimonious use of circuitry compatible with massively parallel fabrication. The GPFP also incorporates Reconfigurable Local Control (RLC), a technique that combines a considerable degree of local autonomy within PEs and microcode flexibility, giving the machine improved general-purpose programmability in addition to floating-point numerical performance.

查看原文本刊更多论文

高性能大规模并行SIMD系统的处理器元件设计

本文描述了通用浮点支持(GPFP)处理元件的体系结构，该处理元件利用VLSI的扩展电路来提供片上存储器和具有成本效益的额外功能。一个主要的目标是加速浮点运算。这与成本效益的体系结构目标相结合，实现了通用单元的浮点能力，并保留了早期可用的1位操作。在50兆赫的时钟下，每台PE的运算能力为每秒250万次浮点运算。归一化到相同的时钟速率，GPFP PE远远超过第一代PE，即DAP是50倍，MPP是20倍，也比最近的MasPar设计高出4倍。32×32 GPFP阵列在32位加法上能够达到2.5千兆次浮点运算和6500 MIPS。这些加速是通过架构特性获得的，而不是数据处理宽度的增加，并且结合了与大规模并行制造兼容的简约电路的使用。GPFP还集成了可重构本地控制(RLC)，这是一种结合pe内部相当程度的本地自治和微码灵活性的技术，除了浮点数值性能外，还提高了机器的通用可编程性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Int. J. High Speed Comput.

自引率

0.00%

发文量