Lock–Unlock

ACM Transactions on Computer Systems (TOCS) Pub Date : 2019-03-14 DOI:10.1145/3301501

R. Guerraoui, Hugo Guiroux, Renaud Lachaize, Vivien Quéma, Vasileios Trigonakis

{"title":"Lock–Unlock","authors":"R. Guerraoui, Hugo Guiroux, Renaud Lachaize, Vivien Quéma, Vasileios Trigonakis","doi":"10.1145/3301501","DOIUrl":null,"url":null,"abstract":"A plethora of optimized mutex lock algorithms have been designed over the past 25 years to mitigate performance bottlenecks related to critical sections and locks. Unfortunately, there is currently no broad study of the behavior of these optimized lock algorithms on realistic applications that consider different performance metrics, such as energy efficiency and tail latency. In this article, we perform a thorough and practical analysis of synchronization, with the goal of providing software developers with enough information to design fast, scalable, and energy-efficient synchronization in their systems. First, we perform a performance study of 28 state-of-the-art mutex lock algorithms, on 40 applications, on four different multicore machines. We consider not only throughput (traditionally the main performance metric) but also energy efficiency and tail latency, which are becoming increasingly important. Second, we present an in-depth analysis in which we summarize our findings for all the studied applications. In particular, we describe nine different lock-related performance bottlenecks, and we propose six guidelines helping software developers with their choice of a lock algorithm according to the different lock properties and the application characteristics. From our detailed analysis, we make several observations regarding locking algorithms and application behaviors, several of which have not been previously discovered: (i) applications stress not only the lock–unlock interface but also the full locking API (e.g., trylocks, condition variables); (ii) the memory footprint of a lock can directly affect the application performance; (iii) for many applications, the interaction between locks and scheduling is an important application performance factor; (vi) lock tail latencies may or may not affect application tail latency; (v) no single lock is systematically the best; (vi) choosing the best lock is difficult; and (vii) energy efficiency and throughput go hand in hand in the context of lock algorithms. These findings highlight that locking involves more considerations than the simple lock/unlock interface and call for further research on designing low-memory footprint adaptive locks that fully and efficiently support the full lock interface, and consider all performance metrics.","PeriodicalId":318554,"journal":{"name":"ACM Transactions on Computer Systems (TOCS)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Computer Systems (TOCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3301501","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 13

Abstract

A plethora of optimized mutex lock algorithms have been designed over the past 25 years to mitigate performance bottlenecks related to critical sections and locks. Unfortunately, there is currently no broad study of the behavior of these optimized lock algorithms on realistic applications that consider different performance metrics, such as energy efficiency and tail latency. In this article, we perform a thorough and practical analysis of synchronization, with the goal of providing software developers with enough information to design fast, scalable, and energy-efficient synchronization in their systems. First, we perform a performance study of 28 state-of-the-art mutex lock algorithms, on 40 applications, on four different multicore machines. We consider not only throughput (traditionally the main performance metric) but also energy efficiency and tail latency, which are becoming increasingly important. Second, we present an in-depth analysis in which we summarize our findings for all the studied applications. In particular, we describe nine different lock-related performance bottlenecks, and we propose six guidelines helping software developers with their choice of a lock algorithm according to the different lock properties and the application characteristics. From our detailed analysis, we make several observations regarding locking algorithms and application behaviors, several of which have not been previously discovered: (i) applications stress not only the lock–unlock interface but also the full locking API (e.g., trylocks, condition variables); (ii) the memory footprint of a lock can directly affect the application performance; (iii) for many applications, the interaction between locks and scheduling is an important application performance factor; (vi) lock tail latencies may or may not affect application tail latency; (v) no single lock is systematically the best; (vi) choosing the best lock is difficult; and (vii) energy efficiency and throughput go hand in hand in the context of lock algorithms. These findings highlight that locking involves more considerations than the simple lock/unlock interface and call for further research on designing low-memory footprint adaptive locks that fully and efficiently support the full lock interface, and consider all performance metrics.

查看原文本刊更多论文

在过去的25年中，已经设计了大量优化的互斥锁算法来缓解与临界区和锁相关的性能瓶颈。不幸的是，目前还没有对这些优化的锁算法在考虑不同性能指标(如能源效率和尾部延迟)的实际应用程序中的行为进行广泛的研究。在本文中，我们对同步进行了全面而实际的分析，目的是为软件开发人员提供足够的信息，以便在他们的系统中设计快速、可伸缩且节能的同步。首先，我们在四台不同的多核机器上的40个应用程序上对28种最先进的互斥锁算法进行了性能研究。我们不仅考虑吞吐量(传统上主要的性能指标)，还考虑能源效率和尾部延迟，它们变得越来越重要。其次，我们提出了一个深入的分析，其中我们总结了我们对所有研究应用的发现。特别是，我们描述了9种不同的锁相关性能瓶颈，并提出了6条指导原则，帮助软件开发人员根据不同的锁属性和应用程序特征选择锁算法。从我们的详细分析中，我们对锁定算法和应用程序行为进行了一些观察，其中一些以前没有被发现:(i)应用程序不仅强调锁-解锁接口，而且强调全锁定API(例如，trylocks，条件变量);锁的内存占用会直接影响应用程序的性能;(iii)对于许多应用程序，锁和调度之间的交互是一个重要的应用程序性能因素;(vi)锁尾延迟可能影响也可能不影响应用程序尾部延迟;(v)没有一个锁是系统上最好的;(六)选择最佳锁困难;(vii)在锁算法的背景下，能源效率和吞吐量齐头并进。这些发现强调，锁定涉及到比简单的锁/解锁接口更多的考虑因素，并呼吁进一步研究如何设计低内存占用的自适应锁，以完全有效地支持全锁接口，并考虑所有性能指标。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ACM Transactions on Computer Systems (TOCS)

自引率

0.00%

发文量