International Conference on Partitioned Global Address Space Programming Models最新文献

Designing Scalable Out-of-core Sorting with Hybrid MPI+PGAS Programming Models 用混合MPI+PGAS编程模型设计可扩展的核外排序

International Conference on Partitioned Global Address Space Programming Models Pub Date : 2014-10-06 DOI: 10.1145/2676870.2676880

Jithin Jose, S. Potluri, H. Subramoni, Xiaoyi Lu, Khaled Hamidouche, K. Schulz, H. Sundar, D. Panda

引用次数: 11

Asymmetric Memory Extension for Openshmem Openshmem的非对称内存扩展

International Conference on Partitioned Global Address Space Programming Models Pub Date : 2014-10-06 DOI: 10.1145/2676870.2676887

Latchesar Ionkov, Ginger Young

引用次数: 1

HPX: A Task Based Programming Model in a Global Address Space 全局地址空间中基于任务的编程模型

International Conference on Partitioned Global Address Space Programming Models Pub Date : 2014-10-06 DOI: 10.1145/2676870.2676883

Hartmut Kaiser, T. Heller, Bryce Adelstein-Lelbach, Adrian Serio, D. Fey

{"title":"HPX: A Task Based Programming Model in a Global Address Space","authors":"Hartmut Kaiser, T. Heller, Bryce Adelstein-Lelbach, Adrian Serio, D. Fey","doi":"10.1145/2676870.2676883","DOIUrl":"https://doi.org/10.1145/2676870.2676883","url":null,"abstract":"The significant increase in complexity of Exascale platforms due to energy-constrained, billion-way parallelism, with major changes to processor and memory architecture, requires new energy-efficient and resilient programming techniques that are portable across multiple future generations of machines. We believe that guaranteeing adequate scalability, programmability, performance portability, resilience, and energy efficiency requires a fundamentally new approach, combined with a transition path for existing scientific applications, to fully explore the rewards of todays and tomorrows systems. We present HPX -- a parallel runtime system which extends the C++11/14 standard to facilitate distributed operations, enable fine-grained constraint based parallelism, and support runtime adaptive resource management. This provides a widely accepted API enabling programmability, composability and performance portability of user applications. By employing a global address space, we seamlessly augment the standard to apply to a distributed case. We present HPX's architecture, design decisions, and results selected from a diverse set of application runs showing superior performance, scalability, and efficiency over conventional practice.","PeriodicalId":245693,"journal":{"name":"International Conference on Partitioned Global Address Space Programming Models","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134373259","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 276

Fault Tolerance for OpenSHMEM OpenSHMEM的容错性

International Conference on Partitioned Global Address Space Programming Models Pub Date : 2014-10-06 DOI: 10.1145/2676870.2676894

Pengfei Hao, Pavel Shamis, Manjunath Gorentla Venkata, S. Pophale, A. Welch, S. Poole, B. Chapman

引用次数: 12

Affine Loop Optimization Based on Modulo Unrolling in Chapel 基于模数展开的Chapel仿射环优化

International Conference on Partitioned Global Address Space Programming Models Pub Date : 2014-10-06 DOI: 10.1145/2676870.2676877

Aroon Sharma, Darren Smith, Joshua Koehler, R. Barua, Michael P. Ferguson

{"title":"Affine Loop Optimization Based on Modulo Unrolling in Chapel","authors":"Aroon Sharma, Darren Smith, Joshua Koehler, R. Barua, Michael P. Ferguson","doi":"10.1145/2676870.2676877","DOIUrl":"https://doi.org/10.1145/2676870.2676877","url":null,"abstract":"This paper presents modulo unrolling without unrolling (modulo unrolling WU), a method for message aggregation for parallel loops in message passing programs that use affine array accesses in Chapel, a Partitioned Global Address Space (PGAS) parallel programming language. Messages incur a non-trivial run time overhead, a significant component of which is independent of the size of the message. Therefore, aggregating messages improves performance. Our optimization for message aggregation is based on a technique known as modulo unrolling, pioneered by Barua [3], whose purpose was to ensure a statically predictable single tile number for each memory reference for tiled architectures, such as the MIT Raw Machine [18]. Modulo unrolling WU applies to data that is distributed in a cyclic or block-cyclic manner. In this paper, we adapt the aforementioned modulo unrolling technique to the difficult problem of efficiently compiling PGAS languages to message passing architectures. When applied to loops and data distributed cyclically or block-cyclically, modulo unrolling WU can decide when to aggregate messages thereby reducing the overall message count and runtime for a particular loop. Compared to other methods, modulo unrolling WU greatly simplifies the complex problem of automatic code generation of message passing code. It also results in substantial performance improvement compared to the non-optimized Chapel compiler.\u0000 To implement this optimization in Chapel, we modify the leader and follower iterators in the Cyclic and Block Cyclic data distribution modules. Results were collected that compare the performance of Chapel programs optimized with modulo unrolling WU and Chapel programs using the existing Chapel data distributions. Data collected on a ten-locale cluster show that on average, modulo unrolling WU used with Chapel's Cyclic distribution results in 64 percent fewer messages and a 36 percent decrease in runtime for our suite of benchmarks. Similarly, modulo unrolling WU used with Chapel's Block Cyclic distribution results in 72 percent fewer messages and a 53 percent decrease in runtime.","PeriodicalId":245693,"journal":{"name":"International Conference on Partitioned Global Address Space Programming Models","volume":"104 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128262537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Early Evaluation of Scalable Fabric Interface for PGAS Programming Models PGAS编程模型中可扩展结构接口的早期评价

International Conference on Partitioned Global Address Space Programming Models Pub Date : 2014-10-06 DOI: 10.1145/2676870.2676871

Miao Luo, Kayla Seager, K. S. Murthy, C. Archer, S. Sur, Sean Hefty

{"title":"Early Evaluation of Scalable Fabric Interface for PGAS Programming Models","authors":"Miao Luo, Kayla Seager, K. S. Murthy, C. Archer, S. Sur, Sean Hefty","doi":"10.1145/2676870.2676871","DOIUrl":"https://doi.org/10.1145/2676870.2676871","url":null,"abstract":"Inter-processor communication is a critical factor for performance at scale. In order to achieve good performance, communication overheads should be minimized. The fabric interface library plays a major role in determining the communication overheads. This is very important for the Partitioned Global Address Space (PGAS) programming models, as these models have been designed for very low-overhead remote memory access.\u0000 The OpenFabrics Alliance has recently initiated an effort to revamp fabric communication interface to better suit parallel programming models. The new open-source interface is being called Scalable Fabric Interface (SFI). The chief distinguishing feature being that the new interfaces are being co-designed along with the applications that use them, such as PGAS communication libraries.\u0000 In this paper we present an early evaluation of the mapping of PGAS libraries by implementing prototypes of the popular GASNet library and OpenSHMEM over SFI. Our analysis indicates overheads of mapping to SFI are significantly lower than to the current OpenFabrics Verbs communication interface. We can reduce the number of instructions in mapping GASNet to SFI by 82%, Berkeley UPC over GASNet to SFI by 80%, and OpenSHMEM to SFI by 95% as compared to similar mappings to OpenFabrics Verbs interface.","PeriodicalId":245693,"journal":{"name":"International Conference on Partitioned Global Address Space Programming Models","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129329151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Scalable MiniMD Design with Hybrid MPI and OpenSHMEM 混合MPI和OpenSHMEM的可扩展最小化设计

International Conference on Partitioned Global Address Space Programming Models Pub Date : 2014-10-06 DOI: 10.1145/2676870.2676893

Mingzhe Li, Jian Lin, Xiaoyi Lu, Khaled Hamidouche, K. Tomko, D. Panda

{"title":"Scalable MiniMD Design with Hybrid MPI and OpenSHMEM","authors":"Mingzhe Li, Jian Lin, Xiaoyi Lu, Khaled Hamidouche, K. Tomko, D. Panda","doi":"10.1145/2676870.2676893","DOIUrl":"https://doi.org/10.1145/2676870.2676893","url":null,"abstract":"The MPI programming model has been widely used for scientific applications. The emergence of Partitioned Global Address Space (PGAS) programming models presents an alternative approach to improve programmability. With the global data view and lightweight communication operations, PGAS has the potential to increase the performance of scientific applications at scale. However, since the PGAS models are emerging, it is unlikely that entire applications will be re-written with them. Instead, unified communication runtimes have paved the way for a new class of hybrid applications that can leverage the benefits of both MPI and PGAS models. In this paper, we re-design an existing MPI based scientific mini-application (MiniMD) with MPI and OpenSHMEM programming models. We propose two alternative designs using MPI and OpenSHMEM programming models and compare performance and scalability of those designs with the original MPI-based implementation. Our performance evaluations using MVAPICH2-X (Unified MPI+PGAS Communication Runtime over InfiniBand) show a 17% reduction in total execution time, compared to existing MPI-based design with 1,024 cores.","PeriodicalId":245693,"journal":{"name":"International Conference on Partitioned Global Address Space Programming Models","volume":"118 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127723494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

OpenCoarrays: Open-source Transport Layers Supporting Coarray Fortran Compilers OpenCoarrays:支持Coarray Fortran编译器的开源传输层

International Conference on Partitioned Global Address Space Programming Models Pub Date : 2014-10-06 DOI: 10.1145/2676870.2676876

A. Fanfarillo, T. Burnus, V. Cardellini, S. Filippone, D. Nagle, D. Rouson

引用次数: 60

A Multithreaded Communication Substrate for OpenSHMEM OpenSHMEM的多线程通信基板

International Conference on Partitioned Global Address Space Programming Models Pub Date : 2014-10-06 DOI: 10.1145/2676870.2676895

Aurélien Bouteiller, T. Hérault, G. Bosilca

引用次数: 1

Analysis of Energy and Performance of PGAS-based Data Access Patterns 基于pgas的数据访问模式能量与性能分析

International Conference on Partitioned Global Address Space Programming Models Pub Date : 2014-10-06 DOI: 10.1145/2676870.2676882

Siddhartha Jana, Joseph Schuchart, B. Chapman

{"title":"Analysis of Energy and Performance of PGAS-based Data Access Patterns","authors":"Siddhartha Jana, Joseph Schuchart, B. Chapman","doi":"10.1145/2676870.2676882","DOIUrl":"https://doi.org/10.1145/2676870.2676882","url":null,"abstract":"One of the factors associated with the usability of distributed programming models in exascale machines, is the energy and power cost associated with data movement across large-scale systems. PGAS implementations provide users with explicit interfaces for one-sided transfers to remote processes. However, a number of factors across the software stack have the potential of significantly impacting the energy signatures of communication-intensive applications that rely on such transfers. Performance characteristics like the use of non-blocking communication, the actual count of number of initiated transfers, the size of data payload packed within each transfer, as well as the use of pinned-down user buffers, all contribute to this impact.\u0000 In this paper, we discuss a number of RDMA-based communication patterns that are frequently incorporated within applications and communication libraries and, that have the potential of significantly impacting the energy and performance characteristics. We present an empirical study of the potential energy savings achievable by studying the impact on the CPU and DRAM. Since performance is a major criteria for PGAS programming models, we use the energy-delay product as a metric to justify the feasibility of these transformations.\u0000 We hope that this work motivates the incorporation of energy-based metrics for fine tuning PGAS implementations.","PeriodicalId":245693,"journal":{"name":"International Conference on Partitioned Global Address Space Programming Models","volume":"201 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122584098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3