International Conference on Partitioned Global Address Space Programming Models最新文献_第2页

Efficient Interoperability of OpenSHMEM on Multicore Architectures OpenSHMEM在多核体系结构上的高效互操作性

International Conference on Partitioned Global Address Space Programming Models Pub Date : 2014-10-06 DOI: 10.1145/2676870.2676889

K. Ibrahim

引用次数: 0

Multi-Threaded OpenSHMEM: A Bad Idea? 多线程OpenSHMEM:一个坏主意?

International Conference on Partitioned Global Address Space Programming Models Pub Date : 2014-10-06 DOI: 10.1145/2676870.2676890

Gabriele Jost, U. Hanebutte, James Dinan

引用次数: 1

Towards a matrix-oriented strided interface in OpenSHMEM 在OpenSHMEM中实现面向矩阵的跨行接口

International Conference on Partitioned Global Address Space Programming Models Pub Date : 2014-10-06 DOI: 10.1145/2676870.2676888

J. Hammond

引用次数: 5

Experiences at scale with PGAS versions of a Hydrodynamics application 具有PGAS版本流体力学应用程序的大规模经验

International Conference on Partitioned Global Address Space Programming Models Pub Date : 2014-10-06 DOI: 10.1145/2676870.2676873

A. Mallinson, S. Jarvis, W. Gaudin, J. Herdman

{"title":"Experiences at scale with PGAS versions of a Hydrodynamics application","authors":"A. Mallinson, S. Jarvis, W. Gaudin, J. Herdman","doi":"10.1145/2676870.2676873","DOIUrl":"https://doi.org/10.1145/2676870.2676873","url":null,"abstract":"In this work we directly evaluate two PGAS programming models, CAF and OpenSHMEM, as candidate technologies for improving the performance and scalability of scientific applications on future exascale HPC platforms. PGAS approaches are considered by many to represent a promising research direction with the potential to solve some of the existing problems preventing codebases from scaling to exascale levels of performance. The aim of this work is to better inform the exacsale planning at large HPC centres such as AWE. Such organisations invest significant resources maintaining and updating existing scientific codebases, many of which were not designed to run at the scales required to reach exascale levels of computational performance on future system architectures. We document our approach for implementing a recently developed Lagrangian-Eulerian explicit hydrodynamics mini-application in each of these PGAS languages. Furthermore, we also present our results and experiences from scaling these different approaches to high node counts on two state-of-the-art, large scale system architectures from Cray (XC30) and SGI (ICE-X), and compare their utility against an equivalent existing MPI implementation.","PeriodicalId":245693,"journal":{"name":"International Conference on Partitioned Global Address Space Programming Models","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122976253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

Native Mode-Based Optimizations of Remote Memory Accesses in OpenSHMEM for Intel Xeon Phi Intel Xeon Phi处理器OpenSHMEM中基于本地模式的远程内存访问优化

International Conference on Partitioned Global Address Space Programming Models Pub Date : 2014-10-06 DOI: 10.1145/2676870.2676881

N. Namashivayam, Sayan Ghosh, Dounia Khaldi, Deepak Eachempati, B. Chapman

{"title":"Native Mode-Based Optimizations of Remote Memory Accesses in OpenSHMEM for Intel Xeon Phi","authors":"N. Namashivayam, Sayan Ghosh, Dounia Khaldi, Deepak Eachempati, B. Chapman","doi":"10.1145/2676870.2676881","DOIUrl":"https://doi.org/10.1145/2676870.2676881","url":null,"abstract":"OpenSHMEM is a PGAS library that aims to deliver high performance while retaining portability. Communication operations are a major obstacle to scalable parallel performance and are highly dependent on the target architecture. However, to date there has been no work on how to efficiently support OpenSHMEM running natively on Intel Xeon Phi, a highly-parallel, power-efficient and widely-used many-core architecture. Given the importance of communication in parallel architectures, this paper describes a novel methodology for optimizing remote-memory accesses for execution of OpenSHMEM programs on Intel Xeon Phi processors.\u0000 In native mode, we can exploit the Xeon Phi shared memory and convert OpenSHMEM one-sided communication calls into local load/store statements using the shmem_ptr routine. This approach makes it possible for the compiler to perform essential optimizations for Xeon Phi such as vectorization. To the best of our knowledge, this is the first time the impact of shmem_ptr is analyzed thoroughly on a many-core system. We show the benefits of this approach on the PGAS-Microbenchmarks we specifically developed for this research. Our results exhibit a decrease in latency for one-sided communication operations by up to 60% and increase in bandwidth by up to 12x. Moreover, we study different reduction algorithms and exploit local load/store to optimize data transfers in these algorithms for Xeon Phi which permits improvement of up to 22% compared to MVAPICH and up to 60% compared to Intel MPI. Apart from microbenchmarks, experimental results on NAS IS and SP benchmarks show that performance gains of up to 20x are possible.","PeriodicalId":245693,"journal":{"name":"International Conference on Partitioned Global Address Space Programming Models","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127132497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

DART-MPI: An MPI-based Implementation of a PGAS Runtime System 基于mpi的PGAS运行时系统实现

International Conference on Partitioned Global Address Space Programming Models Pub Date : 2014-10-06 DOI: 10.1145/2676870.2676875

Huan Zhou, Yousri Mhedheb, K. Idrees, C. W. Glass, J. Gracia, K. Fürlinger, J. Tao

引用次数: 28

OpenSHMEM Reference Implementation using UCCS-uGNI Transport Layer 使用UCCS-uGNI传输层的OpenSHMEM参考实现

International Conference on Partitioned Global Address Space Programming Models Pub Date : 2014-10-06 DOI: 10.1145/2676870.2676892

T. Janjusic, Pavel Shamis, Manjunath Gorentla Venkata, S. Poole

{"title":"OpenSHMEM Reference Implementation using UCCS-uGNI Transport Layer","authors":"T. Janjusic, Pavel Shamis, Manjunath Gorentla Venkata, S. Poole","doi":"10.1145/2676870.2676892","DOIUrl":"https://doi.org/10.1145/2676870.2676892","url":null,"abstract":"OpenSHMEM is a library interface implementation and specification that enables the implementation of the Partitioned Global Address Space (PGAS) model. It exports modern RDMA network functionality and communication semantics to applications very efficiently. There are many closed source implementations of OpenSHMEM for modern RDMA interconnects such as InfiniBand and Cray's Gemini and Aries. Given the important role that Cray systems play in HPC, in this paper, we present an open source implementation of OpenSHMEM for Cray XE/XK/XC systems.\u0000 To implement OpenSHMEM, we use the uGNI interface. uGNI is a generic interface that is designed for multiple programming models. The interface fits well the goal of UCCS. Having OpenSHMEM with UCCS-uGNI allows usage of the same implementation over multiple interconnects. This also translates into many advantages that come with common code such as resource sharing, increasing productivity because of less code maintenance, etc. Preliminary results show that OpenSHMEM-UCCS performs comparable to state-of-the-art Cray SHMEM for Put, Get, and AMO operations.","PeriodicalId":245693,"journal":{"name":"International Conference on Partitioned Global Address Space Programming Models","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125034742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Extending the OpenSHMEM Memory Model to Support User-Defined Spaces 扩展OpenSHMEM内存模型以支持用户定义的空间

International Conference on Partitioned Global Address Space Programming Models Pub Date : 2014-10-06 DOI: 10.1145/2676870.2676884

A. Welch, S. Pophale, Pavel Shamis, Oscar R. Hernandez, S. Poole, B. Chapman

{"title":"Extending the OpenSHMEM Memory Model to Support User-Defined Spaces","authors":"A. Welch, S. Pophale, Pavel Shamis, Oscar R. Hernandez, S. Poole, B. Chapman","doi":"10.1145/2676870.2676884","DOIUrl":"https://doi.org/10.1145/2676870.2676884","url":null,"abstract":"OpenSHMEM is an open standard for SHMEM libraries. With the standardisation process complete, the community is looking towards extending the API for increasing programmer flexibility and extreme scalability. According to the current OpenSHMEM specification (revision 1.1), allocation of symmetric memory is collective across all PEs executing the application. For better work sharing and memory utilisation, we are proposing the concepts of teams and spaces for OpenSHMEM that together allow allocation of memory only across user-specified teams. Through our implementation we show that by using teams we can confine memory allocation and usage to only the PEs that actually communicate via symmetric memory. We provide our preliminary results that demonstrate creating spaces for teams allows for less consumption of memory resources than the current alternative. We also examine the impact of our extensions on Scalable Synthetic Compact Applications #3 (SSCA3), which is a sensor processing and knowledge formation kernel involving file I/O, and show that up to 30% of symmetric memory allocation can be eliminated without affecting the correctness of the benchmark.","PeriodicalId":245693,"journal":{"name":"International Conference on Partitioned Global Address Space Programming Models","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123035019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

Contexts: A Mechanism for High Throughput Communication in OpenSHMEM 上下文:OpenSHMEM中的高吞吐量通信机制

International Conference on Partitioned Global Address Space Programming Models Pub Date : 2014-10-06 DOI: 10.1145/2676870.2676872

James Dinan, Mario Flajslik

{"title":"Contexts: A Mechanism for High Throughput Communication in OpenSHMEM","authors":"James Dinan, Mario Flajslik","doi":"10.1145/2676870.2676872","DOIUrl":"https://doi.org/10.1145/2676870.2676872","url":null,"abstract":"This paper introduces a proposed extension to the OpenSHMEM parallel programming model, called communication contexts. Contexts introduce a new construct that allows a programmer to generate independent streams of communication operations. In hybrid executions where multiple threads execute within an OpenSHMEM process, contexts eliminate interference between threads, and enable the OpenSHMEM library to map operations generated by threads to private communication resource sets. By providing thread isolation, contexts eliminate synchronization overheads and enable each thread to drive a similar set of resources and achieve performance comparable to an OpenSHMEM process. In conventional, single-threaded execution, contexts provide greater control over ordering of operations and can improve communication and computation overlap. A detailed description of the contexts interface and its implementation for the Portals 4 network programming interface is described. The implementation is evaluated using Mandelbrot set and integer sorting (IS) benchmarks. Contexts provide a 25% performance improvement for Mandelbrot by eliminating thread interference and enabling pipelining, and a 35% improvement was achieved for IS by enabling more effective communication/computation overlap.","PeriodicalId":245693,"journal":{"name":"International Conference on Partitioned Global Address Space Programming Models","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133662093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 21

A Heterogeneous GASNet Implementation for FPGA-accelerated Computing fpga加速计算的异构GASNet实现

International Conference on Partitioned Global Address Space Programming Models Pub Date : 2014-10-06 DOI: 10.1145/2676870.2676885

Ruediger Willenberg, P. Chow

引用次数: 8