The fuzzy barrier: a mechanism for high speed synchronization of processors

ASPLOS III Pub Date : 1989-04-01 DOI:10.1145/70082.68187
Rajiv Gupta
{"title":"The fuzzy barrier: a mechanism for high speed synchronization of processors","authors":"Rajiv Gupta","doi":"10.1145/70082.68187","DOIUrl":null,"url":null,"abstract":"Parallel programs are commonly written using barriers to synchronize parallel processes. Upon reaching a barrier, a processor must stall until all participating processors reach the barrier. A software implementation of the barrier mechanism using shared variables has two major drawbacks. Firstly, the execution of the barrier may be slow as it may not only require execution of several instructions and but also result in hot-spot accesses. Secondly, processors that are stalled waiting for other processors to reach the barrier are essentially idling and cannot do any useful work. In this paper, the notion of the fuzzy barrier is presented, that avoids the above drawbacks. The first problem is avoided by implementing the mechanism in hardware. The second problem is solved by extending the barrier concept to include a region of statements that can be executed by a processor while it awaits synchronization. The barrier regions are constructed by a compiler and consist of several instructions such that a processor is ready to synchronize upon reaching the first instruction in this region and must synchronize before exiting the region. When synchronization does occur, the processors could be executing at any point in their respective barrier regions. The larger the barrier region, the more likely it is that none of the processors will have to stall. Preliminary investigations show that barrier regions can be large and the use of program transformations can significantly increase their size. Examples of situations where such a mechanism can result in improved performance are presented. Results based on a software implementation of the fuzzy barrier on the Encore multiprocessor indicate that the synchronization overhead can be greatly reduced using the mechanism.","PeriodicalId":359206,"journal":{"name":"ASPLOS III","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1989-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"152","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ASPLOS III","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/70082.68187","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 152

Abstract

Parallel programs are commonly written using barriers to synchronize parallel processes. Upon reaching a barrier, a processor must stall until all participating processors reach the barrier. A software implementation of the barrier mechanism using shared variables has two major drawbacks. Firstly, the execution of the barrier may be slow as it may not only require execution of several instructions and but also result in hot-spot accesses. Secondly, processors that are stalled waiting for other processors to reach the barrier are essentially idling and cannot do any useful work. In this paper, the notion of the fuzzy barrier is presented, that avoids the above drawbacks. The first problem is avoided by implementing the mechanism in hardware. The second problem is solved by extending the barrier concept to include a region of statements that can be executed by a processor while it awaits synchronization. The barrier regions are constructed by a compiler and consist of several instructions such that a processor is ready to synchronize upon reaching the first instruction in this region and must synchronize before exiting the region. When synchronization does occur, the processors could be executing at any point in their respective barrier regions. The larger the barrier region, the more likely it is that none of the processors will have to stall. Preliminary investigations show that barrier regions can be large and the use of program transformations can significantly increase their size. Examples of situations where such a mechanism can result in improved performance are presented. Results based on a software implementation of the fuzzy barrier on the Encore multiprocessor indicate that the synchronization overhead can be greatly reduced using the mechanism.
模糊屏障:处理器高速同步的一种机制
并行程序通常使用屏障来同步并行进程。到达屏障后,处理器必须停止运行,直到所有参与的处理器到达屏障。使用共享变量的屏障机制的软件实现有两个主要缺点。首先,屏障的执行可能很慢,因为它不仅需要执行几个指令,而且还会导致热点访问。其次,等待其他处理器到达屏障的处理器实际上是闲置的,不能做任何有用的工作。本文提出了模糊屏障的概念,避免了上述缺陷。通过在硬件中实现该机制,可以避免第一个问题。第二个问题是通过扩展屏障概念来解决的,它包含了处理器在等待同步时可以执行的语句区域。屏障区域由编译器构造,由几个指令组成,这样处理器在到达该区域的第一条指令时就准备好同步,并且必须在退出该区域之前同步。当同步发生时,处理器可以在各自屏障区域的任何点执行。屏障区域越大,就越有可能没有一个处理器必须停止运行。初步调查表明,屏障区域可能很大,使用程序转换可以显着增加它们的大小。本文给出了这种机制可以提高性能的情况示例。基于Encore多处理器上模糊屏障的软件实现结果表明,使用该机制可以大大降低同步开销。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信