International Conference on Partitioned Global Address Space Programming Models最新文献

筛选
英文 中文
HabaneroUPC++: a Compiler-free PGAS Library habaneroupc++:一个无需编译器的PGAS库
International Conference on Partitioned Global Address Space Programming Models Pub Date : 2014-10-06 DOI: 10.1145/2676870.2676879
Vivek Kumar, Yili Zheng, Vincent Cavé, Zoran Budimlic, Vivek Sarkar
{"title":"HabaneroUPC++: a Compiler-free PGAS Library","authors":"Vivek Kumar, Yili Zheng, Vincent Cavé, Zoran Budimlic, Vivek Sarkar","doi":"10.1145/2676870.2676879","DOIUrl":"https://doi.org/10.1145/2676870.2676879","url":null,"abstract":"The Partitioned Global Address Space (PGAS) programming models combine shared and distributed memory features, providing the basis for high performance and high productivity parallel programming environments. UPC++ [39] is a very recent PGAS implementation that takes a library-based approach and avoids the complexities associated with compiler transformations. However, this implementation does not support dynamic task parallelism and only relies on other threading models (e.g., OpenMP or pthreads) for exploiting parallelism within a PGAS place.\u0000 In this paper, we introduce a compiler-free PGAS library called HabaneroUPC++, which supports a tighter integration of intra-place and inter-place parallelism than standard hybrid programming approaches. The library makes heavy use of C++11 lambda functions in its APIs. C++11 lambdas avoid the need for compiler support while still retaining the syntactic convenience of language-based approaches. The HabaneroUPC++ library implementation is based on a tight integration of the UPC++ library and the Habanero-C++ library, with new extensions to support the integration. The UPC++ library is used to provide PGAS communication and function shipping support using GASNet, and the Habanero-C++ library is used to provide support for intra-place work-stealing integrated with function shipping. We demonstrate the programmability and performance of our implementation using two benchmarks, scaled up to 6K cores. The insights developed in this paper promise to further enhance the usability and popularity of PGAS programming models.","PeriodicalId":245693,"journal":{"name":"International Conference on Partitioned Global Address Space Programming Models","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131517136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 47
Development and Extension of Atomic Memory Operations in OpenSHMEM OpenSHMEM中原子内存操作的开发和扩展
International Conference on Partitioned Global Address Space Programming Models Pub Date : 2014-10-06 DOI: 10.1145/2676870.2676891
Pavel Shamis, Manjunath Gorentla Venkata, S. Poole, S. Pophale, Mike Dubman, R. Graham, Dror Goldenberg, G. Shainer
{"title":"Development and Extension of Atomic Memory Operations in OpenSHMEM","authors":"Pavel Shamis, Manjunath Gorentla Venkata, S. Poole, S. Pophale, Mike Dubman, R. Graham, Dror Goldenberg, G. Shainer","doi":"10.1145/2676870.2676891","DOIUrl":"https://doi.org/10.1145/2676870.2676891","url":null,"abstract":"A distinguishing characteristic of OpenSHMEM compared to other PGAS programming model implementations is its support for atomic memory operations (AMOs). It provides a rich set of AMO interfaces supporting 32-bit and 64-bit datatypes. On most modern networks, network-implemented AMOs are known to outperform software-implemented AMOs. So, for achieving high-performance, an OpenSHMEM implementation should try to offload AMOs to the underlying network hardware when possible. Nevertheless, the challenge arises when (a) underlying hardware does not support full set of atomic operations, (b) more that one device is used, and (c) heterogeneous systems with multiple types of devices are involved. In this paper, we analyze the challenges and discuss potential solutions to address these challenges.","PeriodicalId":245693,"journal":{"name":"International Conference on Partitioned Global Address Space Programming Models","volume":"354 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122793748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluation of PGAS Communication Paradigms with Geometric Multigrid 基于几何多重网格的PGAS通信范式评价
International Conference on Partitioned Global Address Space Programming Models Pub Date : 2014-10-06 DOI: 10.1145/2676870.2676874
H. Shan, A. Kamil, Samuel Williams, Yili Zheng, K. Yelick
{"title":"Evaluation of PGAS Communication Paradigms with Geometric Multigrid","authors":"H. Shan, A. Kamil, Samuel Williams, Yili Zheng, K. Yelick","doi":"10.1145/2676870.2676874","DOIUrl":"https://doi.org/10.1145/2676870.2676874","url":null,"abstract":"Partitioned Global Address Space (PGAS) languages and one-sided communication enable application developers to select the communication paradigm that balances the performance needs of applications with the productivity desires of programmers. In this paper, we evaluate three different one-sided communication paradigms in the context of geometric multigrid using the miniGMG benchmark. Although miniGMG's static, regular, and predictable communication does not exploit the ultimate potential of PGAS models, multigrid solvers appear in many contemporary applications and represent one of the most important communication patterns. We use UPC++, a PGAS extension of C++, as the vehicle for our evaluation, though our work is applicable to any of the existing PGAS languages and models. We compare performance with the highly tuned MPI baseline, and the results indicate that the most promising approach towards achieving performance and ease of programming is to use high-level abstractions, such as the multidimensional arrays provided by UPC++, that hide data aggregation and messaging in the runtime library.","PeriodicalId":245693,"journal":{"name":"International Conference on Partitioned Global Address Space Programming Models","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124331845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
One-Sided Append: A New Communication Paradigm For PGAS Models 单侧追加:PGAS模型的一种新的通信范式
International Conference on Partitioned Global Address Space Programming Models Pub Date : 2014-10-06 DOI: 10.1145/2676870.2676886
James Dinan, Mario Flajslik
{"title":"One-Sided Append: A New Communication Paradigm For PGAS Models","authors":"James Dinan, Mario Flajslik","doi":"10.1145/2676870.2676886","DOIUrl":"https://doi.org/10.1145/2676870.2676886","url":null,"abstract":"One-sided append represents a new class of one-sided operations that can be used to aggregate messages from multiple communication sources into a single destination buffer. This new communication paradigm is analyzed in terms of its impact on the OpenSHMEM parallel programming model and applications. Implementation considerations are discussed and an accelerated implementation using the Portals 4 networking API is presented. Initial experimental results with the NAS integer sort benchmark indicate that this new operation can significantly improve the communication performance of such applications.","PeriodicalId":245693,"journal":{"name":"International Conference on Partitioned Global Address Space Programming Models","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126962678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hiding latency in Coarray Fortran 2.0 隐藏Coarray Fortran 2.0中的延迟
International Conference on Partitioned Global Address Space Programming Models Pub Date : 2010-10-12 DOI: 10.1145/2020373.2020387
William N. Scherer, L. Adhianto, G. Jin, J. Mellor-Crummey, Chaoran Yang
{"title":"Hiding latency in Coarray Fortran 2.0","authors":"William N. Scherer, L. Adhianto, G. Jin, J. Mellor-Crummey, Chaoran Yang","doi":"10.1145/2020373.2020387","DOIUrl":"https://doi.org/10.1145/2020373.2020387","url":null,"abstract":"In Numrich and Reid's 1998 proposal [17], Coarray Fortran is a simple set of extensions to Fortran 95, principal among which is support for shared data known as coarrays. Responding to short-comings in the Fortran Standards Committee's addition of coarrays to the Fortran 2008 standards, we at Rice envisioned an extensive update which has come to be known as Coarray Fortran 2.0 [15]. In this paper, we chronicle the evolution of Coarray Fortran 2.0 as it gains support for asynchronous point-to-point and collective operations. We outline how these operations are implemented and describe code fragments from several benchmark programs to show we use these operations to hide latency by overlapping communication and computation.","PeriodicalId":245693,"journal":{"name":"International Conference on Partitioned Global Address Space Programming Models","volume":"84 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133835423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
An open-source compiler and runtime implementation for Coarray Fortran Coarray Fortran的开源编译器和运行时实现
International Conference on Partitioned Global Address Space Programming Models Pub Date : 2010-10-12 DOI: 10.1145/2020373.2020386
Deepak Eachempati, H. Jun, B. Chapman
{"title":"An open-source compiler and runtime implementation for Coarray Fortran","authors":"Deepak Eachempati, H. Jun, B. Chapman","doi":"10.1145/2020373.2020386","DOIUrl":"https://doi.org/10.1145/2020373.2020386","url":null,"abstract":"Coarray Fortran (CAF) comprises a set of proposed language extensions to Fortran that are expected to be adopted as part of the Fortran 2008 standard. In contrast to prior open-source implementation efforts, our approach is to use a single, unified compiler infrastructure to translate, optimize and generate binaries from CAF codes. In this paper, we will describe our compiler and runtime implementation of CAF using an Open64-based compiler infrastructure. We will detail the process by which we generate a high-level intermediate representation from the CAF code in our compilers front-end, how our compiler analyzes and translate this IR to generate a binary which makes use of our runtime system, and how we support the runtime execution model with our runtime library. We have carried out experiments using both an ARMCI- and GASNet-based runtime implementation, and we present these results.","PeriodicalId":245693,"journal":{"name":"International Conference on Partitioned Global Address Space Programming Models","volume":"93 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129053522","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Performance modeling for multilevel communication in SHMEM+ SHMEM+中多层通信的性能建模
International Conference on Partitioned Global Address Space Programming Models Pub Date : 2010-10-12 DOI: 10.1145/2020373.2020380
V. Aggarwal, C. Yoon, A. George, H. Lam, G. Stitt
{"title":"Performance modeling for multilevel communication in SHMEM+","authors":"V. Aggarwal, C. Yoon, A. George, H. Lam, G. Stitt","doi":"10.1145/2020373.2020380","DOIUrl":"https://doi.org/10.1145/2020373.2020380","url":null,"abstract":"The field of high-performance computing (HPC) is currently undergoing a major transformation brought upon by a variety of new processor device technologies. Accelerator devices (e.g. FPGA, GPU) are becoming increasingly popular as coprocessors in HPC, embedded, and other systems, improving application performance while in some cases also reducing energy consumption. The presence of such devices introduces additional levels of communication and memory hierarchy in the system, which warrants an expansion of conventional parallel-programming practices to address these differences. Programming models and libraries for heterogeneous, parallel, and reconfigurable computing such as SHMEM+ have been developed to support communication and coordination involving a diverse mix of processor devices. However, to evaluate the impact of communication on application performance and obtain optimal performance, a concrete understanding of the underlying communication infrastructure is often imperative. In this paper, we introduce a new multilevel communication model for representing various data transfers encountered in these systems and for predicting performance. Three use cases are presented and evaluated. First, the model enables application developers to perform early design-space exploration of communication patterns in their applications before undertaking the laborious and expensive process of implementation, yielding improved performance and productivity. Second, the model enables system developers to quickly optimize performance of data-transfer routines within tools such as SHMEM+ when being ported to a new platform. Third, the model augments tools such as SHMEM+ to automatically improve performance of data transfers by self-tuning internal parameters to match platform capabilities. Results from experiments with these use cases suggest marked improvement in performance, productivity, and portability.","PeriodicalId":245693,"journal":{"name":"International Conference on Partitioned Global Address Space Programming Models","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115142949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Improving UPC productivity via integrated development tools 通过集成开发工具提高UPC生产力
International Conference on Partitioned Global Address Space Programming Models Pub Date : 2010-10-12 DOI: 10.1145/2020373.2020381
Max Billingsley, Beth Tibbitts, A. George
{"title":"Improving UPC productivity via integrated development tools","authors":"Max Billingsley, Beth Tibbitts, A. George","doi":"10.1145/2020373.2020381","DOIUrl":"https://doi.org/10.1145/2020373.2020381","url":null,"abstract":"In the world of high-performance computing (HPC), there has been an increased focus in recent years upon the importance of productivity in HPC application development. One crucial aspect of productivity is the programming model used, and the family of partitioned global-address-space (PGAS) models, such as UPC and X10, has served to advance the state of the art in balancing performance and productivity. Also of great importance is the variety of development tools used to support activities such as editing, debugging, and optimizing programs. These tools are often most useful as part of an integrated development environment (IDE). While some progress has been made towards bringing IDE capabilities into the HPC world, in particular by way of Eclipse projects, support has mainly focused on MPI and OpenMP tools.\u0000 In this paper, we present research and development activities that are bringing Eclipse-based IDE capabilities to the PGAS developer community. We focus on tools for UPC, giving background on previously existing capabilities to work with UPC programs in Eclipse and then presenting a tool-chain and project wizard for the open-source Berkeley UPC compiler, basic UPC static analysis tools, and integration of our performance analysis tool (Parallel Performance Wizard) supporting UPC. Finally, we conclude by proposing future work and providing recommendations for further integration of UPC and other PGAS tools to enhance overall developer productivity.","PeriodicalId":245693,"journal":{"name":"International Conference on Partitioned Global Address Space Programming Models","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122106421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Hybrid PGAS runtime support for multicore nodes 多核节点的混合PGAS运行时支持
International Conference on Partitioned Global Address Space Programming Models Pub Date : 2010-10-12 DOI: 10.1145/2020373.2020376
F. Blagojevic, Paul H. Hargrove, Costin Iancu, K. Yelick
{"title":"Hybrid PGAS runtime support for multicore nodes","authors":"F. Blagojevic, Paul H. Hargrove, Costin Iancu, K. Yelick","doi":"10.1145/2020373.2020376","DOIUrl":"https://doi.org/10.1145/2020373.2020376","url":null,"abstract":"With multicore processors as the standard building block for high performance systems, parallel runtime systems need to provide excellent performance on shared memory, distributed memory, and hybrids. Conventional wisdom suggests that threads should be used as the runtime mechanism within shared memory, and two runtime versions for shared and distributed memory are often designed and implemented separately, retrofitting after the fact for hybrid systems. In this paper we consider the problem of implementing a runtime layer for Partitioned Global Address Space (PGAS) languages, which offer a uniform programming abstraction for hybrid machines. We present a new process-based shared memory runtime and compare it to our previous pthreads implementation. Both are integrated with the GASNet communication layer, and they can co-exist with one another. We evaluate the shared memory runtime approaches, showing that they interact in important and sometimes surprising ways with the communication layer. Using a set of microbenchmarks and application level benchmarks on an IBM BG/P, Cray XT, and InfiniBand cluster, we show that threads, processes and combinations of both are needed for maximum performance. Our new runtime shows speedups of over 60% for application benchmarks and 100% for collective communication benchmarks, when compared to the previous implementation. Our work primarily targets PGAS languages, but some of the lessons are relevant to other parallel runtime systems and libraries.","PeriodicalId":245693,"journal":{"name":"International Conference on Partitioned Global Address Space Programming Models","volume":"162 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122785000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 38
Asynchronous PGAS runtime for Myrinet networks 用于Myrinet网络的异步PGAS运行时
International Conference on Partitioned Global Address Space Programming Models Pub Date : 2010-10-12 DOI: 10.1145/2020373.2020377
Montse Farreras, G. Almási
{"title":"Asynchronous PGAS runtime for Myrinet networks","authors":"Montse Farreras, G. Almási","doi":"10.1145/2020373.2020377","DOIUrl":"https://doi.org/10.1145/2020373.2020377","url":null,"abstract":"PGAS languages aim to enhance productivity for large scale systems. The IBM Asynchronous PGAS runtime (APGAS) supports various high productivity programming languages including UPC, X10 and CAF. The runtime has been designed for scalability and performance portability, and it includes optimized implementations for LAPI and Blue Gene DCMF communication sub systems.\u0000 This paper presents an optimized implementation of the IBM APGAS runtime for Myrinet networks, on top of the MX communication library. It explains the challenges of implementing a one-sided communication model (APGAS) on top of a two-sided communication API such as MX.\u0000 We show that our implementation outperforms the Berkeley GASNet runtime in terms of latency and bandwidth. We also demonstrate scalability of various HPC benchmarks up to 1024 processes.","PeriodicalId":245693,"journal":{"name":"International Conference on Partitioned Global Address Space Programming Models","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115394299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信