500tflops异构集群中的GPGPU角色

GPGPU-3 Pub Date : 2010-03-14 DOI:10.1145/1735688.1735700

R. Linderman

{"title":"500tflops异构集群中的GPGPU角色","authors":"R. Linderman","doi":"10.1145/1735688.1735700","DOIUrl":null,"url":null,"abstract":"The outstanding price-performance of GPGPU technology has made it a key architectural engine within a 500 TFLOPS Heterogeneous Cluster being assembled by the Air Force Research Laboratory in Rome, NY. This new machine will likely be the largest interactive HPC in the world and feature $4/GFLOPS overall system performance and 1.5 TFLOPS/KW power efficiency. The heterogeneous aspect of the cluster reflects a combination of roughly 300 TFLOPS performance from 2000 PS3 gaming consoles plus 200 TFLOPS from GPGPUs closely coupled to 84 headnodes of the subclusters within the overall machine.\n The blend of GPGPUs, Cell processors within the PS3s, and Xeon processors in the headnodes is a deliberate mixing intended to offer an alternative programming environments suiting different applications, or combining on portions of applications. The large DRAM memory and local disk capacity of the multicore Xeon headnode is a familiar environment for handling a wide swath of the application codes with a popular computing environment. But for segments of applications requiring higher performance the Cell and GPGPU architectures are available for acceleration based on large scale parallelization.\n This talk will discuss programming experiences to date on the GPGPUs, Cells, and Xeons and discuss the attributes of algorithms that would favor each of these aspects of the heterogeneous machine.","PeriodicalId":381071,"journal":{"name":"GPGPU-3","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"GPGPU role within a 500 TFLOPS heterogeneous cluster\",\"authors\":\"R. Linderman\",\"doi\":\"10.1145/1735688.1735700\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The outstanding price-performance of GPGPU technology has made it a key architectural engine within a 500 TFLOPS Heterogeneous Cluster being assembled by the Air Force Research Laboratory in Rome, NY. This new machine will likely be the largest interactive HPC in the world and feature $4/GFLOPS overall system performance and 1.5 TFLOPS/KW power efficiency. The heterogeneous aspect of the cluster reflects a combination of roughly 300 TFLOPS performance from 2000 PS3 gaming consoles plus 200 TFLOPS from GPGPUs closely coupled to 84 headnodes of the subclusters within the overall machine.\\n The blend of GPGPUs, Cell processors within the PS3s, and Xeon processors in the headnodes is a deliberate mixing intended to offer an alternative programming environments suiting different applications, or combining on portions of applications. The large DRAM memory and local disk capacity of the multicore Xeon headnode is a familiar environment for handling a wide swath of the application codes with a popular computing environment. But for segments of applications requiring higher performance the Cell and GPGPU architectures are available for acceleration based on large scale parallelization.\\n This talk will discuss programming experiences to date on the GPGPUs, Cells, and Xeons and discuss the attributes of algorithms that would favor each of these aspects of the heterogeneous machine.\",\"PeriodicalId\":381071,\"journal\":{\"name\":\"GPGPU-3\",\"volume\":\"10 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-03-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"GPGPU-3\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/1735688.1735700\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"GPGPU-3","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1735688.1735700","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

GPGPU技术卓越的性价比使其成为纽约罗马空军研究实验室正在组装的500 TFLOPS异构集群中的关键架构引擎。这台新机器可能是世界上最大的交互式HPC，整体系统性能为$4/GFLOPS，功率效率为1.5 TFLOPS/KW。集群的异构方面反映了来自2000台PS3游戏机的大约300 TFLOPS性能加上来自gpgpu的200 TFLOPS的组合，这些性能与整个机器内的84个子集群的头节点紧密耦合。ps3中的gpgpu、Cell处理器和头节点中的Xeon处理器的混合是一种有意的混合，旨在提供适合不同应用程序的替代编程环境，或者组合部分应用程序。多核Xeon头节点的大DRAM内存和本地磁盘容量是一种熟悉的环境，可以在流行的计算环境中处理大量应用程序代码。但对于需要更高性能的应用程序部分，Cell和GPGPU架构可用于基于大规模并行化的加速。本讲座将讨论迄今为止在gpgpu、cell和xeon上的编程经验，并讨论支持异构机器的这些方面的算法属性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

GPGPU role within a 500 TFLOPS heterogeneous cluster

The outstanding price-performance of GPGPU technology has made it a key architectural engine within a 500 TFLOPS Heterogeneous Cluster being assembled by the Air Force Research Laboratory in Rome, NY. This new machine will likely be the largest interactive HPC in the world and feature $4/GFLOPS overall system performance and 1.5 TFLOPS/KW power efficiency. The heterogeneous aspect of the cluster reflects a combination of roughly 300 TFLOPS performance from 2000 PS3 gaming consoles plus 200 TFLOPS from GPGPUs closely coupled to 84 headnodes of the subclusters within the overall machine. The blend of GPGPUs, Cell processors within the PS3s, and Xeon processors in the headnodes is a deliberate mixing intended to offer an alternative programming environments suiting different applications, or combining on portions of applications. The large DRAM memory and local disk capacity of the multicore Xeon headnode is a familiar environment for handling a wide swath of the application codes with a popular computing environment. But for segments of applications requiring higher performance the Cell and GPGPU architectures are available for acceleration based on large scale parallelization. This talk will discuss programming experiences to date on the GPGPUs, Cells, and Xeons and discuss the attributes of algorithms that would favor each of these aspects of the heterogeneous machine.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

GPGPU-3

自引率

0.00%

发文量