编译器支持自动检查点

Sung-Eun Choi, Steven J. Deitz
{"title":"编译器支持自动检查点","authors":"Sung-Eun Choi, Steven J. Deitz","doi":"10.1109/HPCSA.2002.1019157","DOIUrl":null,"url":null,"abstract":"Checkpointing is a key technology for applications on large cluster computer systems. As cluster sizes grow, component failures will become a normal part of operation, and applications will have to deal more directly with repeated failures during program runs. We describe automatic checkpointing in the ZPL compiler and its advantages over traditional library or system-based approaches that have no information about application behavior. We show that even naive compiler-inserted checkpoints can significantly reduce the size of the checkpoint recovery data, up to 73% in our application suite. We also introduce the notion of checkpoint ranges, a range of code where processors can perform a local checkpoint at any time during the range. The compiler guarantees that these local checkpoints form a globally consistent checkpoint without global coordination by ensuring that there are no in-flight messages during the checkpoint range. Checkpoint ranges help further alleviate any additional network congestion caused by checkpointing.","PeriodicalId":111862,"journal":{"name":"Proceedings 16th Annual International Symposium on High Performance Computing Systems and Applications","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2002-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":"{\"title\":\"Compiler support for automatic checkpointing\",\"authors\":\"Sung-Eun Choi, Steven J. Deitz\",\"doi\":\"10.1109/HPCSA.2002.1019157\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Checkpointing is a key technology for applications on large cluster computer systems. As cluster sizes grow, component failures will become a normal part of operation, and applications will have to deal more directly with repeated failures during program runs. We describe automatic checkpointing in the ZPL compiler and its advantages over traditional library or system-based approaches that have no information about application behavior. We show that even naive compiler-inserted checkpoints can significantly reduce the size of the checkpoint recovery data, up to 73% in our application suite. We also introduce the notion of checkpoint ranges, a range of code where processors can perform a local checkpoint at any time during the range. The compiler guarantees that these local checkpoints form a globally consistent checkpoint without global coordination by ensuring that there are no in-flight messages during the checkpoint range. Checkpoint ranges help further alleviate any additional network congestion caused by checkpointing.\",\"PeriodicalId\":111862,\"journal\":{\"name\":\"Proceedings 16th Annual International Symposium on High Performance Computing Systems and Applications\",\"volume\":\"2 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2002-06-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"19\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings 16th Annual International Symposium on High Performance Computing Systems and Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HPCSA.2002.1019157\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings 16th Annual International Symposium on High Performance Computing Systems and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCSA.2002.1019157","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 19

摘要

检查点技术是大型集群计算机系统应用的关键技术。随着集群规模的增长,组件故障将成为操作的正常部分,应用程序将不得不更直接地处理程序运行期间的重复故障。我们描述了ZPL编译器中的自动检查点,以及它相对于传统的基于库或系统的方法的优势,这些方法没有关于应用程序行为的信息。我们表明,即使是简单的编译器插入的检查点也可以显著减少检查点恢复数据的大小,在我们的应用程序套件中最多可减少73%。我们还介绍了检查点范围的概念,即处理器可以在该范围内的任何时间执行本地检查点的代码范围。编译器通过确保检查点范围内没有正在运行的消息来保证这些局部检查点形成全局一致的检查点,而不需要全局协调。检查点范围有助于进一步减轻由检查点引起的任何额外的网络拥塞。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Compiler support for automatic checkpointing
Checkpointing is a key technology for applications on large cluster computer systems. As cluster sizes grow, component failures will become a normal part of operation, and applications will have to deal more directly with repeated failures during program runs. We describe automatic checkpointing in the ZPL compiler and its advantages over traditional library or system-based approaches that have no information about application behavior. We show that even naive compiler-inserted checkpoints can significantly reduce the size of the checkpoint recovery data, up to 73% in our application suite. We also introduce the notion of checkpoint ranges, a range of code where processors can perform a local checkpoint at any time during the range. The compiler guarantees that these local checkpoints form a globally consistent checkpoint without global coordination by ensuring that there are no in-flight messages during the checkpoint range. Checkpoint ranges help further alleviate any additional network congestion caused by checkpointing.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信