Optimally synchronizing DOACROSS loops on shared memory multiprocessors

R. Rajamony, A. Cox
{"title":"Optimally synchronizing DOACROSS loops on shared memory multiprocessors","authors":"R. Rajamony, A. Cox","doi":"10.1109/PACT.1997.644017","DOIUrl":null,"url":null,"abstract":"We present two algorithms to minimize the amount of synchronization added when parallelizing a loop with loop-carried dependences. In contrast to existing schemes, our algorithms add lesser synchronization, while preserving the parallelism that can be extracted from the loop. Our first algorithm uses an interval graph representation of the dependence \"overlap\" to find a synchronization placement in time almost linear in the number of dependences. Although this solution may be suboptimal, it is still better than that obtained using existing methods, which first eliminate redundant dependences and then synchronize the remaining ones. Determining the optimal synchronization is an NP-complete problem. Our second algorithm therefore uses integer programming to determine the optimal solution. We first use a polynomial-time algorithm to find a minimal search space that must contain the optimal solution. Then, we formulate the problem of choosing the minimal synchronization from the search space as a set-cover problem, and solve it exactly using 0-1 integer programming. We show the performance impact of our algorithms by synchronizing a set of synthetic loops on an 8-processor Convex Exemplar. The greedily synchronized loops ran between 7% and 22% faster than those synchronized by the best existing algorithm. Relative to the same base, the optimally synchronized loops ran between 10% and and 22% faster.","PeriodicalId":177411,"journal":{"name":"Proceedings 1997 International Conference on Parallel Architectures and Compilation Techniques","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1997-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings 1997 International Conference on Parallel Architectures and Compilation Techniques","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PACT.1997.644017","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 12

Abstract

We present two algorithms to minimize the amount of synchronization added when parallelizing a loop with loop-carried dependences. In contrast to existing schemes, our algorithms add lesser synchronization, while preserving the parallelism that can be extracted from the loop. Our first algorithm uses an interval graph representation of the dependence "overlap" to find a synchronization placement in time almost linear in the number of dependences. Although this solution may be suboptimal, it is still better than that obtained using existing methods, which first eliminate redundant dependences and then synchronize the remaining ones. Determining the optimal synchronization is an NP-complete problem. Our second algorithm therefore uses integer programming to determine the optimal solution. We first use a polynomial-time algorithm to find a minimal search space that must contain the optimal solution. Then, we formulate the problem of choosing the minimal synchronization from the search space as a set-cover problem, and solve it exactly using 0-1 integer programming. We show the performance impact of our algorithms by synchronizing a set of synthetic loops on an 8-processor Convex Exemplar. The greedily synchronized loops ran between 7% and 22% faster than those synchronized by the best existing algorithm. Relative to the same base, the optimally synchronized loops ran between 10% and and 22% faster.
在共享内存多处理器上优化同步DOACROSS循环
我们提出了两种算法,以最小化并行化具有循环携带依赖性的循环时添加的同步量。与现有方案相比,我们的算法增加了较少的同步,同时保留了可以从循环中提取的并行性。我们的第一个算法使用依赖关系“重叠”的间隔图表示来找到依赖关系数量在时间上几乎线性的同步位置。尽管这个解决方案可能不是最优的,但它仍然比使用现有方法获得的解决方案要好,现有方法首先消除冗余依赖,然后同步剩余的依赖。确定最优同步是一个np完全问题。因此,我们的第二个算法使用整数规划来确定最优解。我们首先使用多项式时间算法来找到必须包含最优解的最小搜索空间。然后,我们将从搜索空间中选择最小同步问题表述为集合覆盖问题,并利用0-1整数规划精确求解。我们通过在一个8处理器的Convex Exemplar上同步一组合成循环来展示算法对性能的影响。贪婪同步循环的运行速度比现有最佳算法同步的循环快7%到22%。相对于相同的基数,最佳同步循环的运行速度在10%到22%之间。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信