Detecting and surviving data races using complementary schedules

Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles Pub Date : 2011-10-23 DOI:10.1145/2043556.2043590

K. Veeraraghavan, Peter M. Chen, J. Flinn, S. Narayanasamy

{"title":"Detecting and surviving data races using complementary schedules","authors":"K. Veeraraghavan, Peter M. Chen, J. Flinn, S. Narayanasamy","doi":"10.1145/2043556.2043590","DOIUrl":null,"url":null,"abstract":"Data races are a common source of errors in multithreaded programs. In this paper, we show how to protect a program from data race errors at runtime by executing multiple replicas of the program with complementary thread schedules. Complementary schedules are a set of replica thread schedules crafted to ensure that replicas diverge only if a data race occurs and to make it very likely that harmful data races cause divergences. Our system, called Frost, uses complementary schedules to cause at least one replica to avoid the order of racing instructions that leads to incorrect program execution for most harmful data races. Frost introduces outcome-based race detection, which detects data races by comparing the state of replicas executing complementary schedules. We show that this method is substantially faster than existing dynamic race detectors for unmanaged code. To help programs survive bugs in production, Frost also diagnoses the data race bug and selects an appropriate recovery strategy, such as choosing a replica that is likely to be correct or executing more replicas to gather additional information. Frost controls the thread schedules of replicas by running all threads of a replica non-preemptively on a single core. To scale the program to multiple cores, Frost runs a third replica in parallel to generate checkpoints of the program's likely future states --- these checkpoints let Frost divide program execution into multiple epochs, which it then runs in parallel. We evaluate Frost using 11 real data race bugs in desktop and server applications. Frost both detects and survives all of these data races. Since Frost runs three replicas, its utilization cost is 3x. However, if there are spare cores to absorb this increased utilization, Frost adds only 3--12% overhead to application runtime.","PeriodicalId":20672,"journal":{"name":"Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles","volume":"40 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2011-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"93","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2043556.2043590","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 93

Abstract

Data races are a common source of errors in multithreaded programs. In this paper, we show how to protect a program from data race errors at runtime by executing multiple replicas of the program with complementary thread schedules. Complementary schedules are a set of replica thread schedules crafted to ensure that replicas diverge only if a data race occurs and to make it very likely that harmful data races cause divergences. Our system, called Frost, uses complementary schedules to cause at least one replica to avoid the order of racing instructions that leads to incorrect program execution for most harmful data races. Frost introduces outcome-based race detection, which detects data races by comparing the state of replicas executing complementary schedules. We show that this method is substantially faster than existing dynamic race detectors for unmanaged code. To help programs survive bugs in production, Frost also diagnoses the data race bug and selects an appropriate recovery strategy, such as choosing a replica that is likely to be correct or executing more replicas to gather additional information. Frost controls the thread schedules of replicas by running all threads of a replica non-preemptively on a single core. To scale the program to multiple cores, Frost runs a third replica in parallel to generate checkpoints of the program's likely future states --- these checkpoints let Frost divide program execution into multiple epochs, which it then runs in parallel. We evaluate Frost using 11 real data race bugs in desktop and server applications. Frost both detects and survives all of these data races. Since Frost runs three replicas, its utilization cost is 3x. However, if there are spare cores to absorb this increased utilization, Frost adds only 3--12% overhead to application runtime.

查看原文本刊更多论文

使用互补调度检测和保存数据竞争

数据竞争是多线程程序中常见的错误来源。在本文中，我们展示了如何在运行时通过执行具有互补线程调度的程序的多个副本来保护程序免受数据竞争错误的影响。互补调度是一组副本线程调度，旨在确保只有在发生数据争用时副本才会发散，并使有害的数据争用很可能导致发散。我们的系统称为Frost，它使用互补调度来产生至少一个副本，以避免在大多数有害的数据竞争中导致程序执行错误的指令顺序。Frost引入了基于结果的竞争检测，它通过比较执行互补调度的副本的状态来检测数据竞争。我们证明，对于非托管代码，这种方法比现有的动态竞争检测器要快得多。为了帮助程序在生产环境中幸存下来，Frost还诊断数据竞争错误并选择适当的恢复策略，例如选择可能正确的副本或执行更多副本以收集额外信息。Frost通过在单个核心上非抢占地运行副本的所有线程来控制副本的线程调度。为了将程序扩展到多个核心，Frost并行运行第三个副本，以生成程序可能的未来状态的检查点——这些检查点让Frost将程序执行分为多个时代，然后并行运行。我们使用桌面和服务器应用程序中的11个真实数据竞赛错误来评估Frost。Frost既能检测到这些数据竞争，又能存活下来。因为Frost运行三个副本，它的使用成本是3x。然而，如果有备用核来吸收增加的利用率，Frost只会在应用程序运行时增加3- 12%的开销。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles

自引率

0.00%

发文量