DESCRY: reproducing system-level concurrency failures

Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering Pub Date : 2017-08-21 DOI:10.1145/3106237.3106266

Tingting Yu, T. S. Zaman, Chao Wang

{"title":"DESCRY: reproducing system-level concurrency failures","authors":"Tingting Yu, T. S. Zaman, Chao Wang","doi":"10.1145/3106237.3106266","DOIUrl":null,"url":null,"abstract":"Concurrent systems may fail in the field due to various elusive faults such as race conditions. Reproducing such failures is hard because (1) concurrency failures at the system level often involve multiple processes or event handlers (e.g., software signals), which cannot be handled by existing tools for reproducing intra-process (thread-level) failures; (2) detailed field data, such as user input, file content and interleaving schedule, may not be available to developers; and (3) the debugging environment may differ from the deployed environment, which further complicates failure reproduction. To address these problems, we present DESCRY, the first fully automated tool for reproducing system-level concurrency failures based only on default log messages collected from the field. DESCRY uses a combination of static and dynamic analysis techniques, together with symbolic execution, to synthesize both the failure-inducing data input and the interleaving schedule, and leverages them to deterministically replay the failed execution using existing virtual platforms. We have evaluated DESCRY on 22 real-world multi-process Linux applications with a total of 236,875 lines of code to demonstrate both its effectiveness and its efficiency in reproducing failures that no other tool can reproduce.","PeriodicalId":313494,"journal":{"name":"Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering","volume":"98 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3106237.3106266","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 17

Abstract

Concurrent systems may fail in the field due to various elusive faults such as race conditions. Reproducing such failures is hard because (1) concurrency failures at the system level often involve multiple processes or event handlers (e.g., software signals), which cannot be handled by existing tools for reproducing intra-process (thread-level) failures; (2) detailed field data, such as user input, file content and interleaving schedule, may not be available to developers; and (3) the debugging environment may differ from the deployed environment, which further complicates failure reproduction. To address these problems, we present DESCRY, the first fully automated tool for reproducing system-level concurrency failures based only on default log messages collected from the field. DESCRY uses a combination of static and dynamic analysis techniques, together with symbolic execution, to synthesize both the failure-inducing data input and the interleaving schedule, and leverages them to deterministically replay the failed execution using existing virtual platforms. We have evaluated DESCRY on 22 real-world multi-process Linux applications with a total of 236,875 lines of code to demonstrate both its effectiveness and its efficiency in reproducing failures that no other tool can reproduce.

查看原文本刊更多论文

DESCRY:再现系统级并发失败

并发系统可能由于各种难以捉摸的故障(如竞争条件)而在现场失败。再现这样的故障是困难的，因为(1)系统级的并发故障通常涉及多个进程或事件处理程序(例如，软件信号)，而现有的再现进程内(线程级)故障的工具无法处理这些故障;(2)详细的现场数据，如用户输入、文件内容和交错时间表，可能无法提供给开发人员;(3)调试环境可能与部署环境不同，这进一步使故障再现变得复杂。为了解决这些问题，我们提出了DESCRY，这是第一个完全自动化的工具，仅基于从现场收集的默认日志消息来再现系统级并发性故障。DESCRY结合了静态和动态分析技术，以及符号执行，综合了导致故障的数据输入和交错调度，并利用它们确定地使用现有的虚拟平台重播失败的执行。我们在22个真实的多进程Linux应用程序上评估了DESCRY，总共有236,875行代码，以证明它在再现其他工具无法再现的故障方面的有效性和效率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering

自引率

0.00%

发文量