通过优先级更新减少争用

Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures Pub Date : 2013-07-23 DOI:10.1145/2486159.2486189

Julian Shun, G. Blelloch, Jeremy T. Fineman, Phillip B. Gibbons

{"title":"通过优先级更新减少争用","authors":"Julian Shun, G. Blelloch, Jeremy T. Fineman, Phillip B. Gibbons","doi":"10.1145/2486159.2486189","DOIUrl":null,"url":null,"abstract":"Memory contention can be a serious performance bottleneck in concurrent programs on shared-memory multicore architectures. Having all threads write to a small set of shared locations, for example, can lead to orders of magnitude loss in performance relative to all threads writing to distinct locations, or even relative to a single thread doing all the writes. Shared write access, however, can be very useful in parallel algorithms, concurrent data structures, and protocols for communicating among threads. We study the \"priority update\" operation as a useful primitive for limiting write contention in parallel and concurrent programs. A priority update takes as arguments a memory location, a new value, and a comparison function >p that enforces a partial order over values. The operation atomically compares the new value with the current value in the memory location, and writes the new value only if it has higher priority according to >p. On the implementation side, we show that if implemented appropriately, priority updates greatly reduce memory contention over standard writes or other atomic operations when locations have a high degree of sharing. This is shown both experimentally and theoretically. On the application side, we describe several uses of priority updates for implementing parallel algorithms and concurrent data structures, often in a way that is deterministic, guarantees progress, and avoids serial bottlenecks. We present experiments showing that a variety of such algorithms and data structures perform well under high degrees of sharing. Given the results, we believe that the priority update operation serves as a useful parallel primitive and good programming abstraction as (1) the user largely need not worry about the degree of sharing, (2) it can be used to avoid non-determinism since, in the common case when >p is a total order, priority updates commute, and (3) it has many applications to programs using shared data.","PeriodicalId":353007,"journal":{"name":"Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"47","resultStr":"{\"title\":\"Reducing contention through priority updates\",\"authors\":\"Julian Shun, G. Blelloch, Jeremy T. Fineman, Phillip B. Gibbons\",\"doi\":\"10.1145/2486159.2486189\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Memory contention can be a serious performance bottleneck in concurrent programs on shared-memory multicore architectures. Having all threads write to a small set of shared locations, for example, can lead to orders of magnitude loss in performance relative to all threads writing to distinct locations, or even relative to a single thread doing all the writes. Shared write access, however, can be very useful in parallel algorithms, concurrent data structures, and protocols for communicating among threads. We study the \\\"priority update\\\" operation as a useful primitive for limiting write contention in parallel and concurrent programs. A priority update takes as arguments a memory location, a new value, and a comparison function >p that enforces a partial order over values. The operation atomically compares the new value with the current value in the memory location, and writes the new value only if it has higher priority according to >p. On the implementation side, we show that if implemented appropriately, priority updates greatly reduce memory contention over standard writes or other atomic operations when locations have a high degree of sharing. This is shown both experimentally and theoretically. On the application side, we describe several uses of priority updates for implementing parallel algorithms and concurrent data structures, often in a way that is deterministic, guarantees progress, and avoids serial bottlenecks. We present experiments showing that a variety of such algorithms and data structures perform well under high degrees of sharing. Given the results, we believe that the priority update operation serves as a useful parallel primitive and good programming abstraction as (1) the user largely need not worry about the degree of sharing, (2) it can be used to avoid non-determinism since, in the common case when >p is a total order, priority updates commute, and (3) it has many applications to programs using shared data.\",\"PeriodicalId\":353007,\"journal\":{\"name\":\"Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures\",\"volume\":\"2 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-07-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"47\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2486159.2486189\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2486159.2486189","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 47

摘要

内存争用可能是共享内存多核架构上并发程序的严重性能瓶颈。例如，让所有线程都写入一小组共享位置，相对于所有线程都写入不同的位置，甚至相对于单个线程执行所有写入操作，可能会导致性能上的数量级损失。然而，共享写访问在并行算法、并发数据结构和用于线程间通信的协议中非常有用。我们研究了“优先级更新”操作作为限制并行和并发程序中写争用的有用原语。优先级更新以一个内存位置、一个新值和一个比较函数>p作为参数，该函数强制对值进行部分排序。该操作会自动将新值与内存位置中的当前值进行比较，只有当新值根据>p具有更高的优先级时才会写入新值。在实现方面，我们表明，如果实现得当，当位置具有高度共享时，优先级更新会大大减少与标准写操作或其他原子操作相比的内存争用。这在理论上和实验上都得到了证明。在应用程序端，我们描述了优先级更新用于实现并行算法和并发数据结构的几种用法，通常以一种确定性的方式，保证进度并避免串行瓶颈。我们提出的实验表明，各种这样的算法和数据结构在高度共享下表现良好。考虑到结果，我们认为优先级更新操作可以作为一个有用的并行原语和良好的编程抽象，因为(1)用户在很大程度上不需要担心共享的程度，(2)它可以用来避免非确定性，因为在>p是总顺序的常见情况下，优先级更新可以交换，(3)它对使用共享数据的程序有许多应用程序。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Reducing contention through priority updates

Memory contention can be a serious performance bottleneck in concurrent programs on shared-memory multicore architectures. Having all threads write to a small set of shared locations, for example, can lead to orders of magnitude loss in performance relative to all threads writing to distinct locations, or even relative to a single thread doing all the writes. Shared write access, however, can be very useful in parallel algorithms, concurrent data structures, and protocols for communicating among threads. We study the "priority update" operation as a useful primitive for limiting write contention in parallel and concurrent programs. A priority update takes as arguments a memory location, a new value, and a comparison function >p that enforces a partial order over values. The operation atomically compares the new value with the current value in the memory location, and writes the new value only if it has higher priority according to >p. On the implementation side, we show that if implemented appropriately, priority updates greatly reduce memory contention over standard writes or other atomic operations when locations have a high degree of sharing. This is shown both experimentally and theoretically. On the application side, we describe several uses of priority updates for implementing parallel algorithms and concurrent data structures, often in a way that is deterministic, guarantees progress, and avoids serial bottlenecks. We present experiments showing that a variety of such algorithms and data structures perform well under high degrees of sharing. Given the results, we believe that the priority update operation serves as a useful parallel primitive and good programming abstraction as (1) the user largely need not worry about the degree of sharing, (2) it can be used to avoid non-determinism since, in the common case when >p is a total order, priority updates commute, and (3) it has many applications to programs using shared data.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures

自引率

0.00%

发文量