Transparent checkpoint-restart over infiniband

IEEE International Symposium on High-Performance Parallel Distributed Computing Pub Date : 2013-12-13 DOI:10.1145/2600212.2600219

Jiajun Cao, Gregory Kerr, K. Arya, G. Cooperman

{"title":"Transparent checkpoint-restart over infiniband","authors":"Jiajun Cao, Gregory Kerr, K. Arya, G. Cooperman","doi":"10.1145/2600212.2600219","DOIUrl":null,"url":null,"abstract":"Transparently saving the state of the InfiniBand network as part of distributed checkpointing has been a long-standing challenge for researchers. The lack of a solution has forced typical MPI implementations to include custom checkpoint-restart services that \"tear down\" the network, checkpoint each node in isolation, and then re-connect the network again. This work presents the first example of transparent, system-initiated checkpoint-restart that directly supports InfiniBand. The new approach simplifies current practice by avoiding the need for a privileged kernel module. The generality of this approach is demonstrated by applying it both to MPI and to Berkeley UPC (Unified Parallel C), in its native mode (without MPI). Scalability is shown by checkpointing 2,048 MPI processes across 128 nodes (with 16 cores per node). The run-time overhead varies between 0.8% and 1.7%. While checkpoint times dominate, the network-only portion of the implementation is shown to require less than 100 milliseconds (not including the time to locally write application memory to stable storage).","PeriodicalId":330072,"journal":{"name":"IEEE International Symposium on High-Performance Parallel Distributed Computing","volume":"50 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"30","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE International Symposium on High-Performance Parallel Distributed Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2600212.2600219","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 30

Abstract

Transparently saving the state of the InfiniBand network as part of distributed checkpointing has been a long-standing challenge for researchers. The lack of a solution has forced typical MPI implementations to include custom checkpoint-restart services that "tear down" the network, checkpoint each node in isolation, and then re-connect the network again. This work presents the first example of transparent, system-initiated checkpoint-restart that directly supports InfiniBand. The new approach simplifies current practice by avoiding the need for a privileged kernel module. The generality of this approach is demonstrated by applying it both to MPI and to Berkeley UPC (Unified Parallel C), in its native mode (without MPI). Scalability is shown by checkpointing 2,048 MPI processes across 128 nodes (with 16 cores per node). The run-time overhead varies between 0.8% and 1.7%. While checkpoint times dominate, the network-only portion of the implementation is shown to require less than 100 milliseconds (not including the time to locally write application memory to stable storage).

查看原文本刊更多论文

透明的检查点重新启动infiniband

作为分布式检查点的一部分，透明地保存InfiniBand网络的状态一直是研究人员面临的一个长期挑战。由于缺乏解决方案，典型的MPI实现不得不包括自定义检查点重启服务，这些服务会“关闭”网络，隔离地检查每个节点，然后重新连接网络。这项工作展示了直接支持InfiniBand的透明、系统启动的检查点重新启动的第一个示例。新方法通过避免需要特权内核模块来简化当前的实践。通过将这种方法应用于MPI和Berkeley UPC(统一并行C)，在其原生模式下(没有MPI)，可以证明这种方法的通用性。可伸缩性通过检查点跨128个节点(每个节点16个核心)的2,048个MPI进程来显示。运行时开销在0.8%到1.7%之间变化。虽然检查点时间占主导地位，但实现的仅网络部分所需的时间少于100毫秒(不包括在本地将应用程序内存写入稳定存储的时间)。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE International Symposium on High-Performance Parallel Distributed Computing

自引率

0.00%

发文量