{"title":"500tflops异构集群中的GPGPU角色","authors":"R. Linderman","doi":"10.1145/1735688.1735700","DOIUrl":null,"url":null,"abstract":"The outstanding price-performance of GPGPU technology has made it a key architectural engine within a 500 TFLOPS Heterogeneous Cluster being assembled by the Air Force Research Laboratory in Rome, NY. This new machine will likely be the largest interactive HPC in the world and feature $4/GFLOPS overall system performance and 1.5 TFLOPS/KW power efficiency. The heterogeneous aspect of the cluster reflects a combination of roughly 300 TFLOPS performance from 2000 PS3 gaming consoles plus 200 TFLOPS from GPGPUs closely coupled to 84 headnodes of the subclusters within the overall machine.\n The blend of GPGPUs, Cell processors within the PS3s, and Xeon processors in the headnodes is a deliberate mixing intended to offer an alternative programming environments suiting different applications, or combining on portions of applications. The large DRAM memory and local disk capacity of the multicore Xeon headnode is a familiar environment for handling a wide swath of the application codes with a popular computing environment. But for segments of applications requiring higher performance the Cell and GPGPU architectures are available for acceleration based on large scale parallelization.\n This talk will discuss programming experiences to date on the GPGPUs, Cells, and Xeons and discuss the attributes of algorithms that would favor each of these aspects of the heterogeneous machine.","PeriodicalId":381071,"journal":{"name":"GPGPU-3","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"GPGPU role within a 500 TFLOPS heterogeneous cluster\",\"authors\":\"R. Linderman\",\"doi\":\"10.1145/1735688.1735700\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The outstanding price-performance of GPGPU technology has made it a key architectural engine within a 500 TFLOPS Heterogeneous Cluster being assembled by the Air Force Research Laboratory in Rome, NY. This new machine will likely be the largest interactive HPC in the world and feature $4/GFLOPS overall system performance and 1.5 TFLOPS/KW power efficiency. The heterogeneous aspect of the cluster reflects a combination of roughly 300 TFLOPS performance from 2000 PS3 gaming consoles plus 200 TFLOPS from GPGPUs closely coupled to 84 headnodes of the subclusters within the overall machine.\\n The blend of GPGPUs, Cell processors within the PS3s, and Xeon processors in the headnodes is a deliberate mixing intended to offer an alternative programming environments suiting different applications, or combining on portions of applications. The large DRAM memory and local disk capacity of the multicore Xeon headnode is a familiar environment for handling a wide swath of the application codes with a popular computing environment. But for segments of applications requiring higher performance the Cell and GPGPU architectures are available for acceleration based on large scale parallelization.\\n This talk will discuss programming experiences to date on the GPGPUs, Cells, and Xeons and discuss the attributes of algorithms that would favor each of these aspects of the heterogeneous machine.\",\"PeriodicalId\":381071,\"journal\":{\"name\":\"GPGPU-3\",\"volume\":\"10 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-03-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"GPGPU-3\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/1735688.1735700\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"GPGPU-3","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1735688.1735700","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
GPGPU role within a 500 TFLOPS heterogeneous cluster
The outstanding price-performance of GPGPU technology has made it a key architectural engine within a 500 TFLOPS Heterogeneous Cluster being assembled by the Air Force Research Laboratory in Rome, NY. This new machine will likely be the largest interactive HPC in the world and feature $4/GFLOPS overall system performance and 1.5 TFLOPS/KW power efficiency. The heterogeneous aspect of the cluster reflects a combination of roughly 300 TFLOPS performance from 2000 PS3 gaming consoles plus 200 TFLOPS from GPGPUs closely coupled to 84 headnodes of the subclusters within the overall machine.
The blend of GPGPUs, Cell processors within the PS3s, and Xeon processors in the headnodes is a deliberate mixing intended to offer an alternative programming environments suiting different applications, or combining on portions of applications. The large DRAM memory and local disk capacity of the multicore Xeon headnode is a familiar environment for handling a wide swath of the application codes with a popular computing environment. But for segments of applications requiring higher performance the Cell and GPGPU architectures are available for acceleration based on large scale parallelization.
This talk will discuss programming experiences to date on the GPGPUs, Cells, and Xeons and discuss the attributes of algorithms that would favor each of these aspects of the heterogeneous machine.