Efficient Synchronization: Let Them Eat QOLB /sup1/

Conference Proceedings. The 24th Annual International Symposium on Computer Architecture Pub Date : 1997-06-01 DOI:10.1145/264107.264166

A. Kägi, Doug Burger, J. Goodman

{"title":"Efficient Synchronization: Let Them Eat QOLB /sup1/","authors":"A. Kägi, Doug Burger, J. Goodman","doi":"10.1145/264107.264166","DOIUrl":null,"url":null,"abstract":"Efficient synchronization primitives are essential for achieving high performance in fine-grain, shared-memory parallel programs. One function of synchronization primitives is to enable exclusive access to shared data and critical sections of code. This paper makes three contributions. (1) We enumerate the five sources of overhead that locking synchronization primitives can incur. (2) We describe four mechanisms (local spinning, queue-based locking, collocation, and synchronized prefetch) that reduce these synchronization overheads. (3) With detailed simulations, we show the extent to which these four mechanisms can improve the performance of shared-memory programs. We evaluate the space of these mechanisms using seventeen synchronization constructs, which are formed from six base typed of locks (TEST&SET, TEST&TEST&SET, MCS, LH, M, and QOLB). We show that large performance gains (speedups of more than 1.5 for three of five benchmarks) can be achieved if at least three optimizing mechanisms are used simultaneously. We find that QOLB, which incorporates all four mechanisms, outperforms all other primitives (including reactive synchronization) in all cases. Finally, we demonstrate the superior performance of a low-cost implementation of QOLB, which runs on an unmodified cluster of commodity workstations.","PeriodicalId":405506,"journal":{"name":"Conference Proceedings. The 24th Annual International Symposium on Computer Architecture","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1997-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"42","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Conference Proceedings. The 24th Annual International Symposium on Computer Architecture","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/264107.264166","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 42

Abstract

Efficient synchronization primitives are essential for achieving high performance in fine-grain, shared-memory parallel programs. One function of synchronization primitives is to enable exclusive access to shared data and critical sections of code. This paper makes three contributions. (1) We enumerate the five sources of overhead that locking synchronization primitives can incur. (2) We describe four mechanisms (local spinning, queue-based locking, collocation, and synchronized prefetch) that reduce these synchronization overheads. (3) With detailed simulations, we show the extent to which these four mechanisms can improve the performance of shared-memory programs. We evaluate the space of these mechanisms using seventeen synchronization constructs, which are formed from six base typed of locks (TEST&SET, TEST&TEST&SET, MCS, LH, M, and QOLB). We show that large performance gains (speedups of more than 1.5 for three of five benchmarks) can be achieved if at least three optimizing mechanisms are used simultaneously. We find that QOLB, which incorporates all four mechanisms, outperforms all other primitives (including reactive synchronization) in all cases. Finally, we demonstrate the superior performance of a low-cost implementation of QOLB, which runs on an unmodified cluster of commodity workstations.

查看原文本刊更多论文

高效同步:让他们吃QOLB /sup1/

高效的同步原语对于实现细粒度共享内存并行程序的高性能至关重要。同步原语的一个功能是支持对共享数据和代码关键部分的独占访问。本文有三个贡献。(1)我们列举了锁定同步原语可能导致的五种开销来源。(2)我们描述了四种机制(本地旋转、基于队列的锁定、并置和同步预取)来减少这些同步开销。(3)通过详细的仿真，我们展示了这四种机制在多大程度上可以提高共享内存程序的性能。我们使用17种同步结构来评估这些机制的空间，这些同步结构由6种基本类型的锁(TEST&SET, TEST&SET, MCS, LH, M和QOLB)组成。我们表明，如果同时使用至少三种优化机制，就可以获得较大的性能提升(五个基准测试中的三个基准测试的速度提升超过1.5)。我们发现，合并了所有四种机制的QOLB在所有情况下都优于所有其他原语(包括响应式同步)。最后，我们演示了低成本QOLB实现的优越性能，该实现运行在未经修改的商用工作站集群上。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Conference Proceedings. The 24th Annual International Symposium on Computer Architecture

自引率

0.00%

发文量