2022 IEEE/ACM Parallel Applications Workshop: Alternatives To MPI+X (PAW-ATM)最新文献

Design and Performance Evaluation of UCX for Tofu-D Interconnect with OpenSHMEM-UCX on Fugaku Fugaku上openshme -UCX的Tofu-D互连UCX设计与性能评价

2022 IEEE/ACM Parallel Applications Workshop: Alternatives To MPI+X (PAW-ATM) Pub Date : 2022-11-01 DOI: 10.1109/PAW-ATM56565.2022.00010

Yutaka Watanabe, M. Sato, Miwako Tsuji, H. Murai, T. Boku

引用次数: 0

Composition of Algorithmic Building Blocks in Template Task Graphs 模板任务图中算法构建块的组成

2022 IEEE/ACM Parallel Applications Workshop: Alternatives To MPI+X (PAW-ATM) Pub Date : 2022-11-01 DOI: 10.1109/PAW-ATM56565.2022.00008

T. Hérault, Joseph Schuchart, Edward F. Valeev, G. Bosilca

引用次数: 0

Extending OpenMP and OpenSHMEM for Efficient Heterogeneous Computing 扩展OpenMP和OpenSHMEM实现高效异构计算

2022 IEEE/ACM Parallel Applications Workshop: Alternatives To MPI+X (PAW-ATM) Pub Date : 2022-11-01 DOI: 10.1109/PAW-ATM56565.2022.00006

Wen-wei Lu, Shilei Tian, Tony Curtis, B. Chapman

{"title":"Extending OpenMP and OpenSHMEM for Efficient Heterogeneous Computing","authors":"Wen-wei Lu, Shilei Tian, Tony Curtis, B. Chapman","doi":"10.1109/PAW-ATM56565.2022.00006","DOIUrl":"https://doi.org/10.1109/PAW-ATM56565.2022.00006","url":null,"abstract":"Heterogeneous supercomputing systems are becoming mainstream thanks to their powerful accelerators. However, the accelerators’ special memory model and APIs increase the development complexity, and calls for innovative programming model designs. To address this issue, OpenMP has added target offloading for portable accelerator programming, and MPI allows transparent send-receive of accelerator memory buffers. Meanwhile, Partitioned Global Address Space (PGAS) languages like OpenSHMEM are falling behind for heterogeneous computing because their special memory models pose additional challenges.We propose language and runtime interoperability extensions for both OpenMP and OpenSHMEM to enable portable remote access on GPU buffers, with minimal amount of code changes. Our modified runtime systems work in coordination to manage accelerator memory, eliminating the need for staging communication buffers. Compared to the standard implementation, our extensions attain 6x point-to-point latency improvement, 1.3x better collective operation latency, 4.9x random access throughput, and up to 12.5% better performance in strong scaling configurations.","PeriodicalId":231452,"journal":{"name":"2022 IEEE/ACM Parallel Applications Workshop: Alternatives To MPI+X (PAW-ATM)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132941413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Task Fusion in Distributed Runtimes 分布式运行时中的任务融合

2022 IEEE/ACM Parallel Applications Workshop: Alternatives To MPI+X (PAW-ATM) Pub Date : 2022-11-01 DOI: 10.1109/PAW-ATM56565.2022.00007

S. Sundram, Wonchan Lee, A. Aiken

引用次数: 1

Asynchronous Workload Balancing through Persistent Work-Stealing and Offloading for a Distributed Actor Model Library 分布式参与者模型库通过持续工作窃取和卸载实现异步工作负载平衡

2022 IEEE/ACM Parallel Applications Workshop: Alternatives To MPI+X (PAW-ATM) Pub Date : 2022-11-01 DOI: 10.1109/PAW-ATM56565.2022.00009

Yakup Budanaz, Mario Wille, M. Bader

引用次数: 0