2022 IEEE/ACM Parallel Applications Workshop: Alternatives To MPI+X (PAW-ATM)最新文献

筛选
英文 中文
Design and Performance Evaluation of UCX for Tofu-D Interconnect with OpenSHMEM-UCX on Fugaku Fugaku上openshme -UCX的Tofu-D互连UCX设计与性能评价
2022 IEEE/ACM Parallel Applications Workshop: Alternatives To MPI+X (PAW-ATM) Pub Date : 2022-11-01 DOI: 10.1109/PAW-ATM56565.2022.00010
Yutaka Watanabe, M. Sato, Miwako Tsuji, H. Murai, T. Boku
{"title":"Design and Performance Evaluation of UCX for Tofu-D Interconnect with OpenSHMEM-UCX on Fugaku","authors":"Yutaka Watanabe, M. Sato, Miwako Tsuji, H. Murai, T. Boku","doi":"10.1109/PAW-ATM56565.2022.00010","DOIUrl":"https://doi.org/10.1109/PAW-ATM56565.2022.00010","url":null,"abstract":"The partitioned global address space (PGAS) model with one-sided communication has recently received attention as an easy and intuitive method for describing remote data access in nodes. PGAS can be implemented using remote direct memory access, which provides lightweight one-sided communication and low overhead synchronization semantics. In this paper, to enable portable, lightweight, and efficient one-sided communication on the Fugaku supercomputer, we designed and implemented Universal Communication X (UCX) for Tofu Interconnect D. An evaluation using OpenSHMEM-UCX and OSHMPI indicates that OpenSHMEM with UCX on Tofu Interconnect D enables smaller latency and better efficiency compared with that for OpenSHMEM with MPI and that it is beneficial for several applications based on PGAS models.","PeriodicalId":231452,"journal":{"name":"2022 IEEE/ACM Parallel Applications Workshop: Alternatives To MPI+X (PAW-ATM)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115394257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Composition of Algorithmic Building Blocks in Template Task Graphs 模板任务图中算法构建块的组成
2022 IEEE/ACM Parallel Applications Workshop: Alternatives To MPI+X (PAW-ATM) Pub Date : 2022-11-01 DOI: 10.1109/PAW-ATM56565.2022.00008
T. Hérault, Joseph Schuchart, Edward F. Valeev, G. Bosilca
{"title":"Composition of Algorithmic Building Blocks in Template Task Graphs","authors":"T. Hérault, Joseph Schuchart, Edward F. Valeev, G. Bosilca","doi":"10.1109/PAW-ATM56565.2022.00008","DOIUrl":"https://doi.org/10.1109/PAW-ATM56565.2022.00008","url":null,"abstract":"In this paper, we explore the composition capabilities of the Template Task Graph (TTG) programming model. We show how fine-grain composition of tasks is possible in TTG between DAGs belonging to different libraries, even in a distributed setup. We illustrate the benefits of this fine-grain composition on a linear algebra operation, the matrix inversion via the Cholesky method, which consists of three operations that need to be applied in sequence.Evaluation on a cluster of many core shows that the transparent fine-grain composition implements the complex operation without introducing unnecessary synchronizations, increasing the overlap of communication and computation, and thus improving significantly the performance of the entire composed operation.","PeriodicalId":231452,"journal":{"name":"2022 IEEE/ACM Parallel Applications Workshop: Alternatives To MPI+X (PAW-ATM)","volume":"208 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131435819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Extending OpenMP and OpenSHMEM for Efficient Heterogeneous Computing 扩展OpenMP和OpenSHMEM实现高效异构计算
2022 IEEE/ACM Parallel Applications Workshop: Alternatives To MPI+X (PAW-ATM) Pub Date : 2022-11-01 DOI: 10.1109/PAW-ATM56565.2022.00006
Wen-wei Lu, Shilei Tian, Tony Curtis, B. Chapman
{"title":"Extending OpenMP and OpenSHMEM for Efficient Heterogeneous Computing","authors":"Wen-wei Lu, Shilei Tian, Tony Curtis, B. Chapman","doi":"10.1109/PAW-ATM56565.2022.00006","DOIUrl":"https://doi.org/10.1109/PAW-ATM56565.2022.00006","url":null,"abstract":"Heterogeneous supercomputing systems are becoming mainstream thanks to their powerful accelerators. However, the accelerators’ special memory model and APIs increase the development complexity, and calls for innovative programming model designs. To address this issue, OpenMP has added target offloading for portable accelerator programming, and MPI allows transparent send-receive of accelerator memory buffers. Meanwhile, Partitioned Global Address Space (PGAS) languages like OpenSHMEM are falling behind for heterogeneous computing because their special memory models pose additional challenges.We propose language and runtime interoperability extensions for both OpenMP and OpenSHMEM to enable portable remote access on GPU buffers, with minimal amount of code changes. Our modified runtime systems work in coordination to manage accelerator memory, eliminating the need for staging communication buffers. Compared to the standard implementation, our extensions attain 6x point-to-point latency improvement, 1.3x better collective operation latency, 4.9x random access throughput, and up to 12.5% better performance in strong scaling configurations.","PeriodicalId":231452,"journal":{"name":"2022 IEEE/ACM Parallel Applications Workshop: Alternatives To MPI+X (PAW-ATM)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132941413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Task Fusion in Distributed Runtimes 分布式运行时中的任务融合
2022 IEEE/ACM Parallel Applications Workshop: Alternatives To MPI+X (PAW-ATM) Pub Date : 2022-11-01 DOI: 10.1109/PAW-ATM56565.2022.00007
S. Sundram, Wonchan Lee, A. Aiken
{"title":"Task Fusion in Distributed Runtimes","authors":"S. Sundram, Wonchan Lee, A. Aiken","doi":"10.1109/PAW-ATM56565.2022.00007","DOIUrl":"https://doi.org/10.1109/PAW-ATM56565.2022.00007","url":null,"abstract":"We present distributed task fusion, a run-time optimization for task-based runtimes operating on parallel and heterogeneous systems. Distributed task fusion dynamically performs an efficient buffering, analysis, and fusion of asynchronously-evaluated distributed operations, reducing the overheads inherent to scheduling distributed tasks in implicitly parallel frameworks and runtimes. We identify the constraints under which distributed task fusion is permissible and describe an implementation in Legate, a domain-agnostic library for constructing portable and scalable task-based libraries. We present performance results using cuNumeric, a Legate library that enables scalable execution of NumPy pipelines on parallel and heterogeneous systems. We realize speedups up to 1.5x with task fusion enabled on up to 32 P100 GPUs, thus demonstrating efficient execution of pipelines involving many successive fine-grained tasks. Finally, we discuss potential future work, including complementary optimizations that could result in additional performance improvements.","PeriodicalId":231452,"journal":{"name":"2022 IEEE/ACM Parallel Applications Workshop: Alternatives To MPI+X (PAW-ATM)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130255134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Asynchronous Workload Balancing through Persistent Work-Stealing and Offloading for a Distributed Actor Model Library 分布式参与者模型库通过持续工作窃取和卸载实现异步工作负载平衡
2022 IEEE/ACM Parallel Applications Workshop: Alternatives To MPI+X (PAW-ATM) Pub Date : 2022-11-01 DOI: 10.1109/PAW-ATM56565.2022.00009
Yakup Budanaz, Mario Wille, M. Bader
{"title":"Asynchronous Workload Balancing through Persistent Work-Stealing and Offloading for a Distributed Actor Model Library","authors":"Yakup Budanaz, Mario Wille, M. Bader","doi":"10.1109/PAW-ATM56565.2022.00009","DOIUrl":"https://doi.org/10.1109/PAW-ATM56565.2022.00009","url":null,"abstract":"With dynamic imbalances caused by both software and ever more complex hardware, applications and runtime systems must adapt to dynamic load imbalances. We present a diffusion-based, reactive, fully asynchronous, and decentralized dynamic load balancer for a distributed actor library. With the asynchronous execution model, features such as remote procedure calls, and support for serialization of arbitrary types, UPC++ is especially feasible for the implementation of the actor model. While providing a substantial speedup for small- to medium-sized jobs with both predictable and unpredictable workload imbalances, the scalability of the diffusion-based approaches remains below expectations in most presented test cases.","PeriodicalId":231452,"journal":{"name":"2022 IEEE/ACM Parallel Applications Workshop: Alternatives To MPI+X (PAW-ATM)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124587510","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信