Cray X1分布式共享内存架构的性能评估

T. Dunigan, J. Vetter, P. Worley
{"title":"Cray X1分布式共享内存架构的性能评估","authors":"T. Dunigan, J. Vetter, P. Worley","doi":"10.1109/CONECT.2004.1375194","DOIUrl":null,"url":null,"abstract":"The Cray X1 supercomputer is a distributed shared memory vector multiprocessor, scalable to 4096 processors and up to 65 terabytes of memory. The X1's hierarchical design uses the basic building block of the multi-streaming processor (MSP), which is capable of 12.8 GF/s for 64-bit operations. The distributed shared memory (DSM) of the X1 presents a 64-bit global address space that is directly addressable from every MSP with an interconnect bandwidth per computation rate of one byte per floating point operation. Our results show that this high bandwidth and low latency for remote memory accesses translates into improved application performance on important applications, such as an Eulerian gyrokinetic-Maxwell solver. Furthermore, this architecture naturally supports programming models like the Cray shmem API, Unified Parallel C (UPC), and coarray FORTRAN (CAF), and it is imperative to select the appropriate models to exploit these features as our benchmarks demonstrate.","PeriodicalId":224195,"journal":{"name":"Proceedings. 12th Annual IEEE Symposium on High Performance Interconnects","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2004-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"56","resultStr":"{\"title\":\"Performance evaluation of the Cray X1 distributed shared memory architecture\",\"authors\":\"T. Dunigan, J. Vetter, P. Worley\",\"doi\":\"10.1109/CONECT.2004.1375194\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The Cray X1 supercomputer is a distributed shared memory vector multiprocessor, scalable to 4096 processors and up to 65 terabytes of memory. The X1's hierarchical design uses the basic building block of the multi-streaming processor (MSP), which is capable of 12.8 GF/s for 64-bit operations. The distributed shared memory (DSM) of the X1 presents a 64-bit global address space that is directly addressable from every MSP with an interconnect bandwidth per computation rate of one byte per floating point operation. Our results show that this high bandwidth and low latency for remote memory accesses translates into improved application performance on important applications, such as an Eulerian gyrokinetic-Maxwell solver. Furthermore, this architecture naturally supports programming models like the Cray shmem API, Unified Parallel C (UPC), and coarray FORTRAN (CAF), and it is imperative to select the appropriate models to exploit these features as our benchmarks demonstrate.\",\"PeriodicalId\":224195,\"journal\":{\"name\":\"Proceedings. 12th Annual IEEE Symposium on High Performance Interconnects\",\"volume\":\"35 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2004-08-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"56\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings. 12th Annual IEEE Symposium on High Performance Interconnects\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CONECT.2004.1375194\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. 12th Annual IEEE Symposium on High Performance Interconnects","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CONECT.2004.1375194","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 56

摘要

克雷X1超级计算机是一个分布式共享内存矢量多处理器,可扩展到4096个处理器和高达65tb的内存。X1的分层设计使用了多流处理器(MSP)的基本构建块,其64位操作的速度为12.8 GF/s。X1的分布式共享内存(DSM)提供了一个64位全局地址空间,可以从每个MSP直接寻址,每个浮点操作的每个计算率为一个字节的互连带宽。我们的结果表明,远程内存访问的高带宽和低延迟转化为重要应用程序的应用程序性能的改进,例如欧拉陀螺仪-麦克斯韦求解器。此外,这种体系结构自然地支持编程模型,如Cray shmem API、统一并行C (UPC)和coarray FORTRAN (CAF),并且必须选择合适的模型来利用我们的基准测试所展示的这些特性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Performance evaluation of the Cray X1 distributed shared memory architecture
The Cray X1 supercomputer is a distributed shared memory vector multiprocessor, scalable to 4096 processors and up to 65 terabytes of memory. The X1's hierarchical design uses the basic building block of the multi-streaming processor (MSP), which is capable of 12.8 GF/s for 64-bit operations. The distributed shared memory (DSM) of the X1 presents a 64-bit global address space that is directly addressable from every MSP with an interconnect bandwidth per computation rate of one byte per floating point operation. Our results show that this high bandwidth and low latency for remote memory accesses translates into improved application performance on important applications, such as an Eulerian gyrokinetic-Maxwell solver. Furthermore, this architecture naturally supports programming models like the Cray shmem API, Unified Parallel C (UPC), and coarray FORTRAN (CAF), and it is imperative to select the appropriate models to exploit these features as our benchmarks demonstrate.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信