{"title":"Performance implications of fence-based memory models","authors":"H. Boehm","doi":"10.1145/1988915.1988919","DOIUrl":null,"url":null,"abstract":"Most mainstream shared-memory parallel programming languages are converging to a memory model, or shared variable semantics, centered on providing sequential consistency for most data-race-free programs.\n OpenMP, along with a small number of other languages, defines its memory model in terms of implicit fence (e.g. OpenMP flush) operations that force memory accesses to become visible to other threads in order. Synchronization operations provided by the language implicitly include such fences. In the simplest cases this is equivalent to a promise of sequential consistency for data-race-free programs.\n However, real languages typically also provide atomic operations with weak memory ordering constraints, such as the OpenMP atomic directives. These break the above equivalence, making the fence-based model stronger in ways that are observable, but not generally useful. As a result, conventional lock implementations are often accidentally prohibited, adding significant overhead for uncontended locks.\n We show that this problem affects both OpenMP and, in a more subtle way, UPC. We have been working with the OpenMP ARB to resolve these issues in future versions of OpenMP.","PeriodicalId":130040,"journal":{"name":"Workshop on Memory System Performance and Correctness","volume":"90 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Workshop on Memory System Performance and Correctness","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1988915.1988919","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Most mainstream shared-memory parallel programming languages are converging to a memory model, or shared variable semantics, centered on providing sequential consistency for most data-race-free programs.
OpenMP, along with a small number of other languages, defines its memory model in terms of implicit fence (e.g. OpenMP flush) operations that force memory accesses to become visible to other threads in order. Synchronization operations provided by the language implicitly include such fences. In the simplest cases this is equivalent to a promise of sequential consistency for data-race-free programs.
However, real languages typically also provide atomic operations with weak memory ordering constraints, such as the OpenMP atomic directives. These break the above equivalence, making the fence-based model stronger in ways that are observable, but not generally useful. As a result, conventional lock implementations are often accidentally prohibited, adding significant overhead for uncontended locks.
We show that this problem affects both OpenMP and, in a more subtle way, UPC. We have been working with the OpenMP ARB to resolve these issues in future versions of OpenMP.