Lightweight cooperative logging for fault replication in concurrent programs

Nuno Machado, P. Romano, L. Rodrigues
{"title":"Lightweight cooperative logging for fault replication in concurrent programs","authors":"Nuno Machado, P. Romano, L. Rodrigues","doi":"10.1109/DSN.2012.6263953","DOIUrl":null,"url":null,"abstract":"This paper presents CoopREP, a system that provides support for fault replication of concurrent programs, based on cooperative recording and partial log combination. CoopREP employs partial recording to reduce the amount of information that a given program instance is required to store in order to support deterministic replay. This allows to substantially reduce the overhead imposed by the instrumentation of the code, but raises the problem of finding the combination of logs capable of replaying the fault. CoopREP tackles this issue by introducing several innovative statistical analysis techniques aimed at guiding the search of partial logs to be combined and used during the replay phase. CoopREP has been evaluated using both standard benchmarks for multi-threaded applications and a real-world application. The results highlight that CoopREP can successfully replay concurrency bugs involving tens of thousands of memory accesses, reducing logging overhead with respect to state of the art non-cooperative logging schemes by up to 50 times in computationally intensive applications.","PeriodicalId":236791,"journal":{"name":"IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DSN.2012.6263953","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 15

Abstract

This paper presents CoopREP, a system that provides support for fault replication of concurrent programs, based on cooperative recording and partial log combination. CoopREP employs partial recording to reduce the amount of information that a given program instance is required to store in order to support deterministic replay. This allows to substantially reduce the overhead imposed by the instrumentation of the code, but raises the problem of finding the combination of logs capable of replaying the fault. CoopREP tackles this issue by introducing several innovative statistical analysis techniques aimed at guiding the search of partial logs to be combined and used during the replay phase. CoopREP has been evaluated using both standard benchmarks for multi-threaded applications and a real-world application. The results highlight that CoopREP can successfully replay concurrency bugs involving tens of thousands of memory accesses, reducing logging overhead with respect to state of the art non-cooperative logging schemes by up to 50 times in computationally intensive applications.
用于并发程序中错误复制的轻量级协作日志记录
提出了一种基于协同记录和部分日志组合的并发程序故障复制支持系统CoopREP。CoopREP采用部分记录来减少给定程序实例为支持确定性重播而需要存储的信息量。这可以大大减少代码插装所带来的开销,但也带来了寻找能够重播故障的日志组合的问题。CoopREP通过引入一些创新的统计分析技术来解决这个问题,这些技术旨在指导对部分日志的搜索,以便在重放阶段进行组合和使用。已经使用多线程应用程序的标准基准和实际应用程序对CoopREP进行了评估。结果表明,在计算密集型应用程序中,CoopREP可以成功地重播涉及数万个内存访问的并发性错误,相对于最先进的非协作日志记录方案,将日志记录开销减少了50倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信