2021 IEEE/ACM 6th International Workshop on Extreme Scale Programming Models and Middleware (ESPM2)最新文献

筛选
英文 中文
Scalable parallel algorithm for fast computation of Transitive Closure of Graphs on Shared Memory Architectures 共享内存架构下图传递闭包快速计算的可扩展并行算法
Sarthak Patel, Bhrugu Dave, Smit Kumbhani, Mihir Desai, Sidharth Kumar, Bhaskar Chaudhury
{"title":"Scalable parallel algorithm for fast computation of Transitive Closure of Graphs on Shared Memory Architectures","authors":"Sarthak Patel, Bhrugu Dave, Smit Kumbhani, Mihir Desai, Sidharth Kumar, Bhaskar Chaudhury","doi":"10.1109/ESPM254806.2021.00006","DOIUrl":"https://doi.org/10.1109/ESPM254806.2021.00006","url":null,"abstract":"We present a scalable algorithm that computes the transitive closure of a graph on shared memory architectures using the OpenMP API in C++. Two different parallelization strategies have been presented and the performance of the two algorithms has been compared for several data-sets of varying sizes. We demonstrate the scalability of the best parallel implementation up to 176 threads on a shared memory architecture, by producing a graph with more than 3.82 trillion edges. To the best of our knowledge, this is the first implementation that has computed the transitive closure of such a large graph on a shared memory system. Optimization strategies for better cache utilization for large data-sets have been discussed. The important issue of load balancing has been analyzed and its mitigation using the optimal OpenMP scheduling clause has been discussed in detail.","PeriodicalId":155761,"journal":{"name":"2021 IEEE/ACM 6th International Workshop on Extreme Scale Programming Models and Middleware (ESPM2)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133494465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Accelerating Messages by Avoiding Copies in an Asynchronous Task-based Programming Model 在基于异步任务的编程模型中通过避免复制来加速消息
Nitin Bhat, Sam White, Evan Ramos, L. Kalé
{"title":"Accelerating Messages by Avoiding Copies in an Asynchronous Task-based Programming Model","authors":"Nitin Bhat, Sam White, Evan Ramos, L. Kalé","doi":"10.1109/ESPM254806.2021.00007","DOIUrl":"https://doi.org/10.1109/ESPM254806.2021.00007","url":null,"abstract":"Task-based programming models promise improved communication performance for irregular, fine-grained, and load imbalanced applications. They do so by relaxing some of the messaging semantics of stricter models and taking advantage of those at the lower-levels of the software stack. For example, while MPI’s two-sided communication model guarantees in-order delivery, requires matching sends to receives, and has the user schedule communication, task-based models generally favor the runtime system scheduling all execution based on the dependencies and message deliveries as they happen. The messaging semantics are critical to enabling high performance.In this paper, we build on previous work that added zero copy semantics to Converse/LRTS. We examine the messaging semantics of Charm++ as it relates to large message buffers, identify shortcomings, and define new communication APIs to address them. Our work enables in-place communication semantics in the context of point-to-point messaging, broadcasts, transmission of read-only variables at program startup, and for migration of chares. We showcase the performance of our new communication APIs using benchmarks for Charm++ and Adaptive MPI, which result in nearly 90% latency improvement and 2x lower peak memory usage.","PeriodicalId":155761,"journal":{"name":"2021 IEEE/ACM 6th International Workshop on Extreme Scale Programming Models and Middleware (ESPM2)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122694381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluation of Distributed Tasks in Stencil-based Application on GPUs 基于图形处理器的模板应用中分布式任务的评估
Eric Raut, Jonathon M. Anderson, M. Araya-Polo, Jie Meng
{"title":"Evaluation of Distributed Tasks in Stencil-based Application on GPUs","authors":"Eric Raut, Jonathon M. Anderson, M. Araya-Polo, Jie Meng","doi":"10.1109/ESPM254806.2021.00011","DOIUrl":"https://doi.org/10.1109/ESPM254806.2021.00011","url":null,"abstract":"In the era of exascale computing, the traditional MPI+X paradigm starts losing its strength in taking advantage of heterogeneous systems. Subsequently, research and development on finding alternative programming models and runtimes have become increasingly popular. This encourages comparison, on competitive grounds, of these emerging parallel programming approaches against the traditional MPI+X paradigm. In this work, an implementation of distributed task-based stencil numerical simulation is compared with a MPI+X implementation of the same application. To be more specific, the Legion task-based parallel programming system is used as an alternative to MPI at out-of-node level, while the underlying CUDA-implemented kernels are kept at node level. Therefore, the comparison is as fair as possible and focused on the distributed aspects of the simulation. Overall, the results show that the task-based approach is on par with the traditional MPI approach in terms of both performance and scalability.","PeriodicalId":155761,"journal":{"name":"2021 IEEE/ACM 6th International Workshop on Extreme Scale Programming Models and Middleware (ESPM2)","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132317876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
[Copyright notice] (版权)
{"title":"[Copyright notice]","authors":"","doi":"10.1109/espm254806.2021.00002","DOIUrl":"https://doi.org/10.1109/espm254806.2021.00002","url":null,"abstract":"","PeriodicalId":155761,"journal":{"name":"2021 IEEE/ACM 6th International Workshop on Extreme Scale Programming Models and Middleware (ESPM2)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116656734","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Taskflow-San: Sanitizing Erroneous Control Flow in Taskflow Graphs Taskflow- san:清除任务流图中错误的控制流
McKay Mower, Luke Majors, Tsung-Wei Huang
{"title":"Taskflow-San: Sanitizing Erroneous Control Flow in Taskflow Graphs","authors":"McKay Mower, Luke Majors, Tsung-Wei Huang","doi":"10.1109/ESPM254806.2021.00009","DOIUrl":"https://doi.org/10.1109/ESPM254806.2021.00009","url":null,"abstract":"Taskflow is a general-purpose parallel and heterogeneous task graph programming system that enables in-graph control flow to express end-to-end parallelism. By integrating control-flow decisions into condition tasks, developers can efficiently overlap CPU-GPU dependent tasks both inside and outside control flow, largely enhancing the capability of task graph parallelism. Condition tasks are powerful but also mistake-prone. For large task graphs, users can easily encounter erroneous control-flow tasks that cannot be correctly scheduled by the Taskflow runtime. To overcome this challenge, this paper introduces a new instrumentation module, Taskflow-San, to assist users to detect erroneous control-flow tasks in Taskflow graphs.","PeriodicalId":155761,"journal":{"name":"2021 IEEE/ACM 6th International Workshop on Extreme Scale Programming Models and Middleware (ESPM2)","volume":"235 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116863946","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Performance Evaluation of Python Parallel Programming Models: and mpi4py Python并行编程模型的性能评估:与mpi4py
Zane Fink, Simeng Liu, Jaemin Choi, M. Diener, L. Kalé
{"title":"Performance Evaluation of Python Parallel Programming Models: and mpi4py","authors":"Zane Fink, Simeng Liu, Jaemin Choi, M. Diener, L. Kalé","doi":"10.1109/ESPM254806.2021.00010","DOIUrl":"https://doi.org/10.1109/ESPM254806.2021.00010","url":null,"abstract":"Python is rapidly becoming the lingua franca of machine learning and scientific computing. With the broad use of frameworks such as Numpy, SciPy, and TensorFlow, scientific computing and machine learning are seeing a productivity boost on systems without a requisite loss in performance. While high-performance libraries often provide adequate performance within a node, distributed computing is required to scale Python across nodes and make it genuinely competitive in large-scale high-performance computing. Many frameworks, such as Charm4Py, DaCe, Dask, Legate Numpy, mpi4py, and Ray, scale Python across nodes. However, little is known about these frameworks’ relative strengths and weaknesses, leaving practitioners and scientists without enough information about which frameworks are suitable for their requirements. In this paper, we seek to narrow this knowledge gap by studying the relative performance of two such frameworks: Charm4Py and mpi4py.We perform a comparative performance analysis of Charm4Py and mpi4py using CPU and GPU-based microbenchmarks other representative mini-apps for scientific computing.","PeriodicalId":155761,"journal":{"name":"2021 IEEE/ACM 6th International Workshop on Extreme Scale Programming Models and Middleware (ESPM2)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114296576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Parallel SIMD - A Policy Based Solution for Free Speed-Up using C++ Data-Parallel Types 并行SIMD——使用c++数据并行类型实现免费加速的基于策略的解决方案
Srinivas Yadav, Nikunj Gupta, Auriane Reverdell, H. Kaiser
{"title":"Parallel SIMD - A Policy Based Solution for Free Speed-Up using C++ Data-Parallel Types","authors":"Srinivas Yadav, Nikunj Gupta, Auriane Reverdell, H. Kaiser","doi":"10.1109/ESPM254806.2021.00008","DOIUrl":"https://doi.org/10.1109/ESPM254806.2021.00008","url":null,"abstract":"Recent additions to the C++ standard and ongoing standardization efforts aim to add data-parallel types to the C++ standard library. This enables the use of vectorization techniques in existing C++ codes without having to rely on the C++ compiler’s abilities to auto-vectorize the code’s execution. The integration of the existing parallel algorithms with these new data-parallel types opens up a new way of speeding up existing codes with minimal effort. Today, only very little implementation experience exists for potential data-parallel execution of the standard parallel algorithms. In this paper, we report on experiences and performance analysis results for our implementation of two new data-parallel execution policies usable with HPX’s parallel algorithms module: simd and par_simd. We utilize the new experimental implementation of data-parallel types provided by recent versions of the GCC and Clang C++ standard libraries. The benchmark results collected from artificial tests and real-world codes presented in this paper are very promising. Compared to sequenced execution, we report on speed-ups of more than three orders of magnitude when executed using the newly implemented data-parallel execution policy par_simd with HPX’s parallel algorithms. We also report that our implementation is performance portable across different compute architectures (x64 – Intel and AMD, and Arm), using different vectorization extensions (AVX2, AVX512, and NEON128).","PeriodicalId":155761,"journal":{"name":"2021 IEEE/ACM 6th International Workshop on Extreme Scale Programming Models and Middleware (ESPM2)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127077842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信