2021 IEEE/ACM 6th International Workshop on Extreme Scale Programming Models and Middleware (ESPM2)最新文献

Scalable parallel algorithm for fast computation of Transitive Closure of Graphs on Shared Memory Architectures 共享内存架构下图传递闭包快速计算的可扩展并行算法

2021 IEEE/ACM 6th International Workshop on Extreme Scale Programming Models and Middleware (ESPM2) Pub Date : 2021-11-01 DOI: 10.1109/ESPM254806.2021.00006

Sarthak Patel, Bhrugu Dave, Smit Kumbhani, Mihir Desai, Sidharth Kumar, Bhaskar Chaudhury

引用次数: 1

Accelerating Messages by Avoiding Copies in an Asynchronous Task-based Programming Model 在基于异步任务的编程模型中通过避免复制来加速消息

2021 IEEE/ACM 6th International Workshop on Extreme Scale Programming Models and Middleware (ESPM2) Pub Date : 2021-11-01 DOI: 10.1109/ESPM254806.2021.00007

Nitin Bhat, Sam White, Evan Ramos, L. Kalé

{"title":"Accelerating Messages by Avoiding Copies in an Asynchronous Task-based Programming Model","authors":"Nitin Bhat, Sam White, Evan Ramos, L. Kalé","doi":"10.1109/ESPM254806.2021.00007","DOIUrl":"https://doi.org/10.1109/ESPM254806.2021.00007","url":null,"abstract":"Task-based programming models promise improved communication performance for irregular, fine-grained, and load imbalanced applications. They do so by relaxing some of the messaging semantics of stricter models and taking advantage of those at the lower-levels of the software stack. For example, while MPI’s two-sided communication model guarantees in-order delivery, requires matching sends to receives, and has the user schedule communication, task-based models generally favor the runtime system scheduling all execution based on the dependencies and message deliveries as they happen. The messaging semantics are critical to enabling high performance.In this paper, we build on previous work that added zero copy semantics to Converse/LRTS. We examine the messaging semantics of Charm++ as it relates to large message buffers, identify shortcomings, and define new communication APIs to address them. Our work enables in-place communication semantics in the context of point-to-point messaging, broadcasts, transmission of read-only variables at program startup, and for migration of chares. We showcase the performance of our new communication APIs using benchmarks for Charm++ and Adaptive MPI, which result in nearly 90% latency improvement and 2x lower peak memory usage.","PeriodicalId":155761,"journal":{"name":"2021 IEEE/ACM 6th International Workshop on Extreme Scale Programming Models and Middleware (ESPM2)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122694381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Evaluation of Distributed Tasks in Stencil-based Application on GPUs 基于图形处理器的模板应用中分布式任务的评估

2021 IEEE/ACM 6th International Workshop on Extreme Scale Programming Models and Middleware (ESPM2) Pub Date : 2021-11-01 DOI: 10.1109/ESPM254806.2021.00011

Eric Raut, Jonathon M. Anderson, M. Araya-Polo, Jie Meng

引用次数: 4

[Copyright notice] (版权)

2021 IEEE/ACM 6th International Workshop on Extreme Scale Programming Models and Middleware (ESPM2) Pub Date : 2021-11-01 DOI: 10.1109/espm254806.2021.00002

引用次数: 0

Taskflow-San: Sanitizing Erroneous Control Flow in Taskflow Graphs Taskflow- san:清除任务流图中错误的控制流

2021 IEEE/ACM 6th International Workshop on Extreme Scale Programming Models and Middleware (ESPM2) Pub Date : 2021-11-01 DOI: 10.1109/ESPM254806.2021.00009

McKay Mower, Luke Majors, Tsung-Wei Huang

引用次数: 0

Performance Evaluation of Python Parallel Programming Models: and mpi4py Python并行编程模型的性能评估:与mpi4py

2021 IEEE/ACM 6th International Workshop on Extreme Scale Programming Models and Middleware (ESPM2) Pub Date : 2021-11-01 DOI: 10.1109/ESPM254806.2021.00010

Zane Fink, Simeng Liu, Jaemin Choi, M. Diener, L. Kalé

{"title":"Performance Evaluation of Python Parallel Programming Models: and mpi4py","authors":"Zane Fink, Simeng Liu, Jaemin Choi, M. Diener, L. Kalé","doi":"10.1109/ESPM254806.2021.00010","DOIUrl":"https://doi.org/10.1109/ESPM254806.2021.00010","url":null,"abstract":"Python is rapidly becoming the lingua franca of machine learning and scientific computing. With the broad use of frameworks such as Numpy, SciPy, and TensorFlow, scientific computing and machine learning are seeing a productivity boost on systems without a requisite loss in performance. While high-performance libraries often provide adequate performance within a node, distributed computing is required to scale Python across nodes and make it genuinely competitive in large-scale high-performance computing. Many frameworks, such as Charm4Py, DaCe, Dask, Legate Numpy, mpi4py, and Ray, scale Python across nodes. However, little is known about these frameworks’ relative strengths and weaknesses, leaving practitioners and scientists without enough information about which frameworks are suitable for their requirements. In this paper, we seek to narrow this knowledge gap by studying the relative performance of two such frameworks: Charm4Py and mpi4py.We perform a comparative performance analysis of Charm4Py and mpi4py using CPU and GPU-based microbenchmarks other representative mini-apps for scientific computing.","PeriodicalId":155761,"journal":{"name":"2021 IEEE/ACM 6th International Workshop on Extreme Scale Programming Models and Middleware (ESPM2)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114296576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Parallel SIMD - A Policy Based Solution for Free Speed-Up using C++ Data-Parallel Types 并行SIMD——使用c++数据并行类型实现免费加速的基于策略的解决方案

2021 IEEE/ACM 6th International Workshop on Extreme Scale Programming Models and Middleware (ESPM2) Pub Date : 2021-11-01 DOI: 10.1109/ESPM254806.2021.00008

Srinivas Yadav, Nikunj Gupta, Auriane Reverdell, H. Kaiser

{"title":"Parallel SIMD - A Policy Based Solution for Free Speed-Up using C++ Data-Parallel Types","authors":"Srinivas Yadav, Nikunj Gupta, Auriane Reverdell, H. Kaiser","doi":"10.1109/ESPM254806.2021.00008","DOIUrl":"https://doi.org/10.1109/ESPM254806.2021.00008","url":null,"abstract":"Recent additions to the C++ standard and ongoing standardization efforts aim to add data-parallel types to the C++ standard library. This enables the use of vectorization techniques in existing C++ codes without having to rely on the C++ compiler’s abilities to auto-vectorize the code’s execution. The integration of the existing parallel algorithms with these new data-parallel types opens up a new way of speeding up existing codes with minimal effort. Today, only very little implementation experience exists for potential data-parallel execution of the standard parallel algorithms. In this paper, we report on experiences and performance analysis results for our implementation of two new data-parallel execution policies usable with HPX’s parallel algorithms module: simd and par_simd. We utilize the new experimental implementation of data-parallel types provided by recent versions of the GCC and Clang C++ standard libraries. The benchmark results collected from artificial tests and real-world codes presented in this paper are very promising. Compared to sequenced execution, we report on speed-ups of more than three orders of magnitude when executed using the newly implemented data-parallel execution policy par_simd with HPX’s parallel algorithms. We also report that our implementation is performance portable across different compute architectures (x64 – Intel and AMD, and Arm), using different vectorization extensions (AVX2, AVX512, and NEON128).","PeriodicalId":155761,"journal":{"name":"2021 IEEE/ACM 6th International Workshop on Extreme Scale Programming Models and Middleware (ESPM2)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127077842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4