Redefining IBM power system design for CORAL

IF 1.3 4区 计算机科学 Q1 Computer Science
S. Roberts;C. Mann;C. Marroquin
{"title":"Redefining IBM power system design for CORAL","authors":"S. Roberts;C. Mann;C. Marroquin","doi":"10.1147/JRD.2019.2963637","DOIUrl":null,"url":null,"abstract":"Stipulations in the 2014 Collaboration of Oak Ridge, Argonne, and Livermore (CORAL) joint procurement activity not only motivated a fundamental change in IBM's high-performance computer design, which refocused IBM power systems on compute nodes that can scale to 200 petaflops with access to 2.5 PB of memory, but also served the commercial market for single-server applications. The distribution of both processing elements and memory required a careful look at data movement. The resultant AC922 POWER9 system features NVIDIA V100 GPUs with cache line access granularity, more than double the IO bandwidth of PCIe Gen3, and low-latency interfaces interconnected by the state-of-the-art dual-rail Mellanox CAPI EDR HCAs running at 50 Gb/s. With processing units designed to operate at 250 and 300 W, a single system can produce up to 3,080 kW. The overall CORAL solutions achieved power usage effectiveness rankings in the top ten on the Green500. Previous power designs used uniquely designed cabinets and scaled-up infrastructure to achieve efficiency. For successful commercial use, our design uses industry-standard 19-in drawers and racks. Both air- and water-cooled solutions allow for use in a wide range of customer environments. This article documents the novel design features that facilitate data movement and enable new coherent programming models. It describes how three generations of system designs became the foundation for the CORAL contract fulfillment and illustrates key features and specifications of the final product.","PeriodicalId":55034,"journal":{"name":"IBM Journal of Research and Development","volume":"64 3/4","pages":"2:1-2:10"},"PeriodicalIF":1.3000,"publicationDate":"2020-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1147/JRD.2019.2963637","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IBM Journal of Research and Development","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/8949743/","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Computer Science","Score":null,"Total":0}
引用次数: 2

Abstract

Stipulations in the 2014 Collaboration of Oak Ridge, Argonne, and Livermore (CORAL) joint procurement activity not only motivated a fundamental change in IBM's high-performance computer design, which refocused IBM power systems on compute nodes that can scale to 200 petaflops with access to 2.5 PB of memory, but also served the commercial market for single-server applications. The distribution of both processing elements and memory required a careful look at data movement. The resultant AC922 POWER9 system features NVIDIA V100 GPUs with cache line access granularity, more than double the IO bandwidth of PCIe Gen3, and low-latency interfaces interconnected by the state-of-the-art dual-rail Mellanox CAPI EDR HCAs running at 50 Gb/s. With processing units designed to operate at 250 and 300 W, a single system can produce up to 3,080 kW. The overall CORAL solutions achieved power usage effectiveness rankings in the top ten on the Green500. Previous power designs used uniquely designed cabinets and scaled-up infrastructure to achieve efficiency. For successful commercial use, our design uses industry-standard 19-in drawers and racks. Both air- and water-cooled solutions allow for use in a wide range of customer environments. This article documents the novel design features that facilitate data movement and enable new coherent programming models. It describes how three generations of system designs became the foundation for the CORAL contract fulfillment and illustrates key features and specifications of the final product.
为CORAL重新定义IBM电力系统设计
2014年橡树岭、阿贡和利弗莫尔合作组织(CORAL)联合采购活动的规定,不仅激发了IBM高性能计算机设计的根本性变革,使IBM的电源系统重新聚焦于可扩展到每秒200千万亿次浮点运算、访问2.5 PB内存的计算节点,而且还服务于单服务器应用程序的商业市场。处理元素和内存的分布都需要仔细查看数据移动。由此产生的AC922 POWER9系统具有具有缓存线访问颗粒度的NVIDIA V100 gpu,是PCIe Gen3的两倍多的IO带宽,以及由最先进的双轨Mellanox CAPI EDR hca连接的低延迟接口,运行速度为50 Gb/s。处理单元设计为250和300瓦,单个系统可以产生高达3,080千瓦的功率。整体CORAL解决方案在Green500的电力使用效率排名中名列前十。以前的电源设计使用独特设计的机柜和扩展的基础设施来实现效率。为了成功的商业用途,我们的设计使用行业标准的19英寸抽屉和机架。风冷和水冷解决方案均可用于各种客户环境。本文记录了促进数据移动和支持新的连贯编程模型的新颖设计特性。它描述了三代系统设计如何成为CORAL合同履行的基础,并说明了最终产品的关键特性和规格。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
IBM Journal of Research and Development
IBM Journal of Research and Development 工程技术-计算机:硬件
自引率
0.00%
发文量
0
审稿时长
6-12 weeks
期刊介绍: The IBM Journal of Research and Development is a peer-reviewed technical journal, published bimonthly, which features the work of authors in the science, technology and engineering of information systems. Papers are written for the worldwide scientific research and development community and knowledgeable professionals. Submitted papers are welcome from the IBM technical community and from non-IBM authors on topics relevant to the scientific and technical content of the Journal.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信