在系统范围故障下可恢复的互斥

Proceedings of the 2018 ACM Symposium on Principles of Distributed Computing Pub Date : 2018-07-23 DOI:10.1145/3212734.3212755

W. Golab, Danny Hendler

{"title":"在系统范围故障下可恢复的互斥","authors":"W. Golab, Danny Hendler","doi":"10.1145/3212734.3212755","DOIUrl":null,"url":null,"abstract":"Recoverable mutual exclusion (RME) is a variation on the classic mutual exclusion (ME) problem that allows processes to crash and recover. The time complexity of RME algorithms is quantified in the same way as for ME, namely by counting remote memory references -- expensive memory operations that traverse the processor-to-memory interconnect. Prior work has established that the RMR complexity of the RME problem for n processes is Θ(log n) for the class of algorithms that use read/write registers and single-word comparison primitives such as Compare-And-Swap (Golab and Ramaraju 2016), O(log n / log log n) for the class of algorithms that use read/write registers and additional single-word read-modify-primitives such as Fetch-And-Store (Golab and Hendler 2017), and Θ(1) for the class of algorithms that use read/write registers and specialized double-word read-modify-write primitives (Golab and Hendler 2017). These complexity bounds hold in a model of computation where processes may fail independently, and where a process that fails while accessing the mutex is required to recover eventually. This body of work leaves open two important questions: (i) what is the tight bound on the RMR complexity of RME for the class of algorithms that use read/write registers and commonly supported single-word read-modify-primitives; and (ii) how is the RMR complexity of RME affected by variations in the failure model? This paper answers both questions partially by showing that RME can be solved using O(1) RMRs per passage in the worst case in a model where failures are system-wide (i.e., all processes crash simultaneously), and processes receive additional information from the environment regarding the occurrence of the failure. The upper bound algorithm we present relies crucially on a novel RMR-efficient barrier that processes use to synchronize recovery actions after each failure. The barrier uses read/write registers and single-word Compare-And-Swap only. Additionally, we present a transformation that can add properties such as critical section re-entry and a strong notion of starvation freedom to any RME algorithm while preserving its asymptotic RMR complexity.","PeriodicalId":198284,"journal":{"name":"Proceedings of the 2018 ACM Symposium on Principles of Distributed Computing","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":"{\"title\":\"Recoverable Mutual Exclusion Under System-Wide Failures\",\"authors\":\"W. Golab, Danny Hendler\",\"doi\":\"10.1145/3212734.3212755\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recoverable mutual exclusion (RME) is a variation on the classic mutual exclusion (ME) problem that allows processes to crash and recover. The time complexity of RME algorithms is quantified in the same way as for ME, namely by counting remote memory references -- expensive memory operations that traverse the processor-to-memory interconnect. Prior work has established that the RMR complexity of the RME problem for n processes is Θ(log n) for the class of algorithms that use read/write registers and single-word comparison primitives such as Compare-And-Swap (Golab and Ramaraju 2016), O(log n / log log n) for the class of algorithms that use read/write registers and additional single-word read-modify-primitives such as Fetch-And-Store (Golab and Hendler 2017), and Θ(1) for the class of algorithms that use read/write registers and specialized double-word read-modify-write primitives (Golab and Hendler 2017). These complexity bounds hold in a model of computation where processes may fail independently, and where a process that fails while accessing the mutex is required to recover eventually. This body of work leaves open two important questions: (i) what is the tight bound on the RMR complexity of RME for the class of algorithms that use read/write registers and commonly supported single-word read-modify-primitives; and (ii) how is the RMR complexity of RME affected by variations in the failure model? This paper answers both questions partially by showing that RME can be solved using O(1) RMRs per passage in the worst case in a model where failures are system-wide (i.e., all processes crash simultaneously), and processes receive additional information from the environment regarding the occurrence of the failure. The upper bound algorithm we present relies crucially on a novel RMR-efficient barrier that processes use to synchronize recovery actions after each failure. The barrier uses read/write registers and single-word Compare-And-Swap only. Additionally, we present a transformation that can add properties such as critical section re-entry and a strong notion of starvation freedom to any RME algorithm while preserving its asymptotic RMR complexity.\",\"PeriodicalId\":198284,\"journal\":{\"name\":\"Proceedings of the 2018 ACM Symposium on Principles of Distributed Computing\",\"volume\":\"10 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-07-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"20\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2018 ACM Symposium on Principles of Distributed Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3212734.3212755\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2018 ACM Symposium on Principles of Distributed Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3212734.3212755","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 20

摘要

可恢复互斥(RME)是经典互斥(ME)问题的一种变体，它允许进程崩溃并恢复。RME算法的时间复杂度以与ME相同的方式进行量化，即通过计算远程内存引用——遍历处理器到内存互连的昂贵内存操作。先前的工作已经确定，对于使用读写寄存器和单字比较原语(如比较和交换)的算法类，n个进程的RME问题的RMR复杂性为Θ(log n) (Golab和Ramaraju 2016)，对于使用读写寄存器和其他单字读-修改原语(如Fetch-And-Store)的算法类，RMR复杂性为O(log n / log log n) (Golab和Hendler 2017)。和Θ(1)用于使用读/写寄存器和专用双字读-修改-写原语的算法类(Golab和Hendler 2017)。这些复杂性界限适用于这样的计算模型，即进程可能独立失败，并且访问互斥锁时失败的进程需要最终恢复。这项工作留下了两个重要的问题:(i)对于使用读/写寄存器和通常支持的单字读-修改原语的算法类，RME的RMR复杂性的严格界限是什么;(ii) RME的RMR复杂性如何受到失效模型变化的影响?本文部分地回答了这两个问题，在模型的最坏情况下，故障是系统范围的(即，所有进程同时崩溃)，并且进程从环境接收有关故障发生的附加信息，通过展示每个通道使用O(1)个RME可以解决。我们提出的上限算法主要依赖于一种新的rmr高效屏障，该屏障用于在每次故障后同步恢复动作。屏障只使用读/写寄存器和单字比较与交换。此外，我们提出了一种转换，该转换可以为任何RME算法添加诸如临界区重入和强饥饿自由概念等属性，同时保持其渐近RMR复杂性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Recoverable Mutual Exclusion Under System-Wide Failures

Recoverable mutual exclusion (RME) is a variation on the classic mutual exclusion (ME) problem that allows processes to crash and recover. The time complexity of RME algorithms is quantified in the same way as for ME, namely by counting remote memory references -- expensive memory operations that traverse the processor-to-memory interconnect. Prior work has established that the RMR complexity of the RME problem for n processes is Θ(log n) for the class of algorithms that use read/write registers and single-word comparison primitives such as Compare-And-Swap (Golab and Ramaraju 2016), O(log n / log log n) for the class of algorithms that use read/write registers and additional single-word read-modify-primitives such as Fetch-And-Store (Golab and Hendler 2017), and Θ(1) for the class of algorithms that use read/write registers and specialized double-word read-modify-write primitives (Golab and Hendler 2017). These complexity bounds hold in a model of computation where processes may fail independently, and where a process that fails while accessing the mutex is required to recover eventually. This body of work leaves open two important questions: (i) what is the tight bound on the RMR complexity of RME for the class of algorithms that use read/write registers and commonly supported single-word read-modify-primitives; and (ii) how is the RMR complexity of RME affected by variations in the failure model? This paper answers both questions partially by showing that RME can be solved using O(1) RMRs per passage in the worst case in a model where failures are system-wide (i.e., all processes crash simultaneously), and processes receive additional information from the environment regarding the occurrence of the failure. The upper bound algorithm we present relies crucially on a novel RMR-efficient barrier that processes use to synchronize recovery actions after each failure. The barrier uses read/write registers and single-word Compare-And-Swap only. Additionally, we present a transformation that can add properties such as critical section re-entry and a strong notion of starvation freedom to any RME algorithm while preserving its asymptotic RMR complexity.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 2018 ACM Symposium on Principles of Distributed Computing

自引率

0.00%

发文量