Local and Global Shared Memory for Task Based HPC Applications on Heterogeneous Platforms

Chao Liu, M. Leeser
{"title":"Local and Global Shared Memory for Task Based HPC Applications on Heterogeneous Platforms","authors":"Chao Liu, M. Leeser","doi":"10.1109/PDP2018.2018.00055","DOIUrl":null,"url":null,"abstract":"With the prevalence of multicore and manycore processors, developing parallel applications to bene?t from massively parallel resources is important. In this work, we introduce a hybrid shared memory mechanism based on a high-level task design. We implemented task scoped global shared data based on the one-sided communication feature of MPI-3 and enable users to implement and create multi-threaded tasks that can execute either on a single node or on multiple nodes. Task threads of distributed nodes can share data sets through global shared data objects using one-sided remote memory access. We ported and developed a set of benchmark applications and tested on a cluster platform. The high-level task design and hybrid shared memory help users develop and maintain parallel programs easily, and the results show that the global shared data can deliver good RMA performance; the multi-threaded task implementations perform up to 20% faster than ordinary OpenMP programs and have better scaling performance than MPI programs on multiple nodes.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2018-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PDP2018.2018.00055","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

With the prevalence of multicore and manycore processors, developing parallel applications to bene?t from massively parallel resources is important. In this work, we introduce a hybrid shared memory mechanism based on a high-level task design. We implemented task scoped global shared data based on the one-sided communication feature of MPI-3 and enable users to implement and create multi-threaded tasks that can execute either on a single node or on multiple nodes. Task threads of distributed nodes can share data sets through global shared data objects using one-sided remote memory access. We ported and developed a set of benchmark applications and tested on a cluster platform. The high-level task design and hybrid shared memory help users develop and maintain parallel programs easily, and the results show that the global shared data can deliver good RMA performance; the multi-threaded task implementations perform up to 20% faster than ordinary OpenMP programs and have better scaling performance than MPI programs on multiple nodes.
异构平台上基于任务的HPC应用的本地和全局共享内存
随着多核和多核处理器的普及,开发并行应用程序是否会受益?来自大规模并行资源的T很重要。在这项工作中,我们引入了一种基于高级任务设计的混合共享内存机制。我们基于MPI-3的单侧通信特性实现了任务范围内的全局共享数据,并使用户能够实现和创建可以在单个节点或多个节点上执行的多线程任务。分布式节点的任务线程可以使用单侧远程内存访问,通过全局共享数据对象共享数据集。我们移植并开发了一组基准测试应用程序,并在集群平台上进行了测试。采用高级任务设计和混合共享内存,使用户可以轻松地开发和维护并行程序,结果表明,全局共享数据可以提供良好的RMA性能;多线程任务实现比普通OpenMP程序的执行速度快20%,并且在多节点上具有比MPI程序更好的扩展性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信