在没有消息记录的情况下重播分布式程序

Robert H. B. Netzer, Yikang Xu
{"title":"在没有消息记录的情况下重播分布式程序","authors":"Robert H. B. Netzer, Yikang Xu","doi":"10.1109/HPDC.1997.622370","DOIUrl":null,"url":null,"abstract":"Debugging long program runs can be difficult because of the delays required to repeatedly re-run the execution. Even a moderately long run of five minutes can incur aggravating delays. To address this problem, techniques exist that allow re-executing a distributed program from intermediate points by using combinations of checkpointing and message logging. In this paper we explore another idea: how to support replay without logging the contents of any message. When no messages are logged, the set of global states from which replay is possible is constrained, and it has been unknown how to compute this set without exhaustively searching the space of all global states, whose size is exponential in the number of processes. We present a simple and efficient hybrid on-the-fly/post-mortem algorithm for detecting the necessary and sufficient conditions under which parts of the execution can be replayed without message logs. A small amount of trace (two vectors) is recorded at each checkpoint and a fast post-mortem algorithm computes global states from which replay can begin. This algorithm is independent of the checkpointing technique used.","PeriodicalId":243171,"journal":{"name":"Proceedings. The Sixth IEEE International Symposium on High Performance Distributed Computing (Cat. No.97TB100183)","volume":"57 2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1997-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":"{\"title\":\"Replaying distributed programs without message logging\",\"authors\":\"Robert H. B. Netzer, Yikang Xu\",\"doi\":\"10.1109/HPDC.1997.622370\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Debugging long program runs can be difficult because of the delays required to repeatedly re-run the execution. Even a moderately long run of five minutes can incur aggravating delays. To address this problem, techniques exist that allow re-executing a distributed program from intermediate points by using combinations of checkpointing and message logging. In this paper we explore another idea: how to support replay without logging the contents of any message. When no messages are logged, the set of global states from which replay is possible is constrained, and it has been unknown how to compute this set without exhaustively searching the space of all global states, whose size is exponential in the number of processes. We present a simple and efficient hybrid on-the-fly/post-mortem algorithm for detecting the necessary and sufficient conditions under which parts of the execution can be replayed without message logs. A small amount of trace (two vectors) is recorded at each checkpoint and a fast post-mortem algorithm computes global states from which replay can begin. This algorithm is independent of the checkpointing technique used.\",\"PeriodicalId\":243171,\"journal\":{\"name\":\"Proceedings. The Sixth IEEE International Symposium on High Performance Distributed Computing (Cat. No.97TB100183)\",\"volume\":\"57 2 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1997-08-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"20\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings. The Sixth IEEE International Symposium on High Performance Distributed Computing (Cat. No.97TB100183)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HPDC.1997.622370\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. The Sixth IEEE International Symposium on High Performance Distributed Computing (Cat. No.97TB100183)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPDC.1997.622370","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 20

摘要

由于反复重新运行执行所需的延迟,调试长时间的程序运行可能很困难。即使是中等长度的5分钟也会导致严重的延误。为了解决这个问题,现有的技术允许使用检查点和消息日志的组合从中间点重新执行分布式程序。在本文中,我们探索了另一个想法:如何在不记录任何消息内容的情况下支持重播。当没有记录任何消息时,可能重播的全局状态集受到约束,并且不知道如何在不彻底搜索所有全局状态空间(其大小与进程数量呈指数关系)的情况下计算该集合。我们提出了一种简单有效的实时/事后混合算法,用于检测必要和充分的条件,在这些条件下,可以在没有消息日志的情况下重播部分执行。在每个检查点记录少量的跟踪(两个向量),快速的事后分析算法计算重播可以开始的全局状态。该算法独立于所使用的检查点技术。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Replaying distributed programs without message logging
Debugging long program runs can be difficult because of the delays required to repeatedly re-run the execution. Even a moderately long run of five minutes can incur aggravating delays. To address this problem, techniques exist that allow re-executing a distributed program from intermediate points by using combinations of checkpointing and message logging. In this paper we explore another idea: how to support replay without logging the contents of any message. When no messages are logged, the set of global states from which replay is possible is constrained, and it has been unknown how to compute this set without exhaustively searching the space of all global states, whose size is exponential in the number of processes. We present a simple and efficient hybrid on-the-fly/post-mortem algorithm for detecting the necessary and sufficient conditions under which parts of the execution can be replayed without message logs. A small amount of trace (two vectors) is recorded at each checkpoint and a fast post-mortem algorithm computes global states from which replay can begin. This algorithm is independent of the checkpointing technique used.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信