Zest Checkpoint storage system for large supercomputers

2008 3rd Petascale Data Storage Workshop Pub Date : 2008-11-01 DOI:10.1109/PDSW.2008.4811883

P. Nowoczynski, N. Stone, J. Yanovich, J. Sommerfield

引用次数: 34

Abstract

The PSC has developed a prototype distributed file system infrastructure that vastly accelerates aggregated write bandwidth on large compute platforms. Write bandwidth, more than read bandwidth, is the dominant bottleneck in HPC I/O scenarios due to writing checkpoint data, visualization data and post-processing (multi-stage) data. We have prototyped a scalable solution that will be directly applicable to future petascale compute platforms having of order 10^6 cores. Our design emphasizes high-efficiency scalability, low-cost commodity components, lightweight software layers, end-to-end parallelism, client-side caching and software parity, and a unique model of load-balancing outgoing I/O onto high-speed intermediate storage followed by asynchronous reconstruction to a 3rd-party parallel file system.

查看原文本刊更多论文

用于大型超级计算机的Zest Checkpoint存储系统

PSC已经开发了一个原型分布式文件系统基础设施，它极大地加速了大型计算平台上的聚合写带宽。由于写入检查点数据、可视化数据和后处理(多阶段)数据，写带宽比读带宽更成为HPC I/O场景中的主要瓶颈。我们已经设计了一个可扩展的解决方案原型，它将直接适用于未来具有10^6个内核的千万亿次计算平台。我们的设计强调高效的可扩展性、低成本的商品组件、轻量级软件层、端到端并行、客户端缓存和软件奇偶性，以及一个独特的负载平衡模型，即在高速中间存储上输出I/O，然后异步重构到第三方并行文件系统。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2008 3rd Petascale Data Storage Workshop

自引率

0.00%

发文量