Efficient Barrier Synchronization for OpenMP-Like Parallelism on the Intel SCC

H. Al-Khalissi, R. Buchty, Mladen Berekovic
{"title":"Efficient Barrier Synchronization for OpenMP-Like Parallelism on the Intel SCC","authors":"H. Al-Khalissi, R. Buchty, Mladen Berekovic","doi":"10.1109/ICPADS.2013.15","DOIUrl":null,"url":null,"abstract":"The continuous increase of the number of processing cores on die poses a new set of challenges to HPC applications programming including how to model, write, and verify software that has to use the full power of NoC-based manycore processors. Therefore, to simplify program development for the Single-chip Cloud Computer (SCC), it is desirable to have high-level, shared memory-based parallel programming abstractions (e.g., an OpenMP-like programming model). One of the key components of any similar programming model are barrier synchronization primitives, coordinating the work of parallel threads. To allow high-level barrier constructs to deliver good performance, we need an efficient implementation of the underlying synchronization algorithm. In this paper, we propose effective barrier synchronization implementations for shared-memory programming on non-cache-coherent cluster-on-chip represented by the Intel SCC. In particular, we present an extensive evaluation of the overhead associated with integrating barrier algorithms required for OpenMP runtime libraries on such a machine, validating several implementation variants that efficiently exploit the network topology and leveraging SCC-specific hardware. We provide a detailed evaluation of the performance achieved by different approaches by using micro-benchmarks.","PeriodicalId":160979,"journal":{"name":"2013 International Conference on Parallel and Distributed Systems","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 International Conference on Parallel and Distributed Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPADS.2013.15","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

The continuous increase of the number of processing cores on die poses a new set of challenges to HPC applications programming including how to model, write, and verify software that has to use the full power of NoC-based manycore processors. Therefore, to simplify program development for the Single-chip Cloud Computer (SCC), it is desirable to have high-level, shared memory-based parallel programming abstractions (e.g., an OpenMP-like programming model). One of the key components of any similar programming model are barrier synchronization primitives, coordinating the work of parallel threads. To allow high-level barrier constructs to deliver good performance, we need an efficient implementation of the underlying synchronization algorithm. In this paper, we propose effective barrier synchronization implementations for shared-memory programming on non-cache-coherent cluster-on-chip represented by the Intel SCC. In particular, we present an extensive evaluation of the overhead associated with integrating barrier algorithms required for OpenMP runtime libraries on such a machine, validating several implementation variants that efficiently exploit the network topology and leveraging SCC-specific hardware. We provide a detailed evaluation of the performance achieved by different approaches by using micro-benchmarks.
在Intel SCC上实现类似openmp并行的高效屏障同步
芯片上处理内核数量的不断增加给高性能计算应用程序编程带来了一系列新的挑战,包括如何建模、编写和验证必须使用基于cpu的多核处理器的全部功能的软件。因此,为了简化单芯片云计算机(SCC)的程序开发,需要具有高级的、基于共享内存的并行编程抽象(例如,类似openmp的编程模型)。任何类似编程模型的关键组件之一是屏障同步原语,用于协调并行线程的工作。为了允许高级屏障构造提供良好的性能,我们需要底层同步算法的有效实现。在本文中,我们提出了以英特尔SCC为代表的非缓存一致性片上集群的共享内存编程的有效屏障同步实现。特别是,我们对在这样一台机器上集成OpenMP运行时库所需的屏障算法相关的开销进行了广泛的评估,验证了几种有效利用网络拓扑和利用scc特定硬件的实现变体。我们通过使用微基准对不同方法实现的性能进行了详细的评估。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信