International Conference on Partitioned Global Address Space Programming Models最新文献

筛选
英文 中文
Development and performance analysis of a UPC Particle-in-Cell code UPC细胞内粒子代码的开发与性能分析
International Conference on Partitioned Global Address Space Programming Models Pub Date : 2010-10-12 DOI: 10.1145/2020373.2020383
S. Markidis, G. Lapenta
{"title":"Development and performance analysis of a UPC Particle-in-Cell code","authors":"S. Markidis, G. Lapenta","doi":"10.1145/2020373.2020383","DOIUrl":"https://doi.org/10.1145/2020373.2020383","url":null,"abstract":"The development and the implementation of a Particle-in-Cell code written in the Unified Parallel C (UPC) language for plasma simulations with application to astrophysics and fusion nuclear energy machines are presented. A simple one dimensional electrostatic Particle-in-Cell code has been developed first to investigate the implementation details in the UPC language, and second to study the UPC performance on parallel computers. The initial simulations of plasmas with the UPC Particle-in-Cell code and a study of parallel speed-up of the UPC code up to 128 cores are shown.","PeriodicalId":245693,"journal":{"name":"International Conference on Partitioned Global Address Space Programming Models","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133635863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Unifying UPC and MPI runtimes: experience with MVAPICH 统一UPC和MPI运行时:使用MVAPICH的经验
International Conference on Partitioned Global Address Space Programming Models Pub Date : 2010-10-12 DOI: 10.1145/2020373.2020378
Jithin Jose, Miao Luo, S. Sur, D. Panda
{"title":"Unifying UPC and MPI runtimes: experience with MVAPICH","authors":"Jithin Jose, Miao Luo, S. Sur, D. Panda","doi":"10.1145/2020373.2020378","DOIUrl":"https://doi.org/10.1145/2020373.2020378","url":null,"abstract":"Unified Parallel C (UPC) is an emerging parallel programming language that is based on a shared memory paradigm. MPI has been a widely ported and dominant parallel programming model for the past couple of decades. Real-life scientific applications require a lot of investment by domain scientists. Many scientists choose the MPI programming model as it is considered low-risk. It is unlikely that entire applications will be re-written using the emerging UPC language (or PGAS paradigm) in the near future. It is more likely that parts of these applications will be converted to newer models. This requires that underlying implementation of system software be able to support both UPC and MPI simultaneously. Unfortunately, the current state-of-the-art of UPC and MPI interoperability leaves much to be desired both in terms of performance and ease-of-use.\u0000 In this paper, we propose \"Integrated Native Communication Runtime\" (INCR) for MPI and UPC communications on InfiniBand clusters. Our library is capable of supporting both UPC and MPI communications simultaneously. This runtime is based on the widely used MVAPICH (MPI over InfiniBand) Aptus runtime, which is known to scale to tens-of-thousands of cores. Our evaluation reveals that INCR is able to deliver equal or better performance compared to the existing UPC runtime - GASNet on InfiniBand verbs. We observe that with UPC NAS benchmarks CG and MG (class B) at 128 processes, we outperform current GASNet implementation by 10% and 23%, respectively.","PeriodicalId":245693,"journal":{"name":"International Conference on Partitioned Global Address Space Programming Models","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128627250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 55
Introducing OpenSHMEM: SHMEM for the PGAS community 介绍OpenSHMEM: PGAS社区的SHMEM
International Conference on Partitioned Global Address Space Programming Models Pub Date : 2010-10-12 DOI: 10.1145/2020373.2020375
B. Chapman, Tony Curtis, S. Pophale, S. Poole, J. Kuehn, C. Koelbel, Lauren Smith
{"title":"Introducing OpenSHMEM: SHMEM for the PGAS community","authors":"B. Chapman, Tony Curtis, S. Pophale, S. Poole, J. Kuehn, C. Koelbel, Lauren Smith","doi":"10.1145/2020373.2020375","DOIUrl":"https://doi.org/10.1145/2020373.2020375","url":null,"abstract":"The OpenSHMEM community would like to announce a new effort to standardize SHMEM, a communications library that uses one-sided communication and utilizes a partitioned global address space.\u0000 OpenSHMEM is an effort to bring together a variety of SHMEM and SHMEM-like implementations into an open standard using a community-driven model. By creating an open-source specification and reference implementation of OpenSHMEM, there will be a wider availability of a PGAS library model on current and future architectures. In addition, the availability of an OpenSHMEM model will enable the development of performance and validation tools.\u0000 We propose an OpenSHMEM specification to help tie together a number of divergent implementations of SHMEM that are currently available.\u0000 To support an existing and growing user community, we will develop the OpenSHMEM web presence, including a community wiki and training material, and face-to-face interaction, including workshops and conference participation.","PeriodicalId":245693,"journal":{"name":"International Conference on Partitioned Global Address Space Programming Models","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117158256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 219
Numerical Python for scalable architectures 用于可扩展架构的数值Python
International Conference on Partitioned Global Address Space Programming Models Pub Date : 2010-10-12 DOI: 10.1145/2020373.2020388
M. R. B. Kristensen, B. Vinter
{"title":"Numerical Python for scalable architectures","authors":"M. R. B. Kristensen, B. Vinter","doi":"10.1145/2020373.2020388","DOIUrl":"https://doi.org/10.1145/2020373.2020388","url":null,"abstract":"In this paper, we introduce DistNumPy, a library for doing numerical computation in Python that targets scalable distributed memory architectures. DistNumPy extends the NumPy module[15], which is popular for scientific programming. Replacing NumPy with Dist-NumPy enables the user to write sequential Python programs that seamlessly utilize distributed memory architectures. This feature is obtained by introducing a new backend for NumPy arrays, which distribute data amongst the nodes in a distributed memory multi-processor. All operations on this new array will seek to utilize all available processors. The array itself is distributed between multiple processors in order to support larger arrays than a single node can hold in memory.\u0000 We perform three experiments of sequential Python programs running on an Ethernet based cluster of SMP-nodes with a total of 64 CPU-cores. The results show an 88% CPU utilization when running a Monte Carlo simulation, 63% CPU utilization on an N-body simulation and a more modest 50% on a Jacobi solver. The primary limitation in CPU utilization is identified as SMP limitations and not the distribution aspect. Based on the experiments we find that it is possible to obtain significant speedup from using our new array-backend without changing the original Python code.","PeriodicalId":245693,"journal":{"name":"International Conference on Partitioned Global Address Space Programming Models","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130525278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Introducing mNUMA: an extended PGAS architecture 介绍mNUMA:一个扩展的PGAS架构
International Conference on Partitioned Global Address Space Programming Models Pub Date : 2010-10-12 DOI: 10.1145/2020373.2020379
Megan Vance, P. Kogge
{"title":"Introducing mNUMA: an extended PGAS architecture","authors":"Megan Vance, P. Kogge","doi":"10.1145/2020373.2020379","DOIUrl":"https://doi.org/10.1145/2020373.2020379","url":null,"abstract":"We describe design details of a Light Weight Processing migration-NUMA architecture, a novel high performance system design that provides hardware support for a partitioned global address space, migrating subjects, and word level synchronization primitives. Using the architectural definition, combinations of structures are shown to work together to carry out basic actions such as address translation, migration, in-memory synchronization, and work management. We present results from simulation of microkernels showing that LWP-mNUMA compensates for latency with far greater memory access concurrency than possible on a conventional systems. In particular, several microkernels model tough, irregular access patterns that have limited speedups -- in certain problem areas -- to dozens of conventional processors. On these, results show speedup increasing up to 1024 multicore mNUMA processing nodes, running over 1 million threadlets.","PeriodicalId":245693,"journal":{"name":"International Conference on Partitioned Global Address Space Programming Models","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115889535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Predicting remote reuse distance patterns in UPC applications 预测UPC应用程序中的远程重用距离模式
International Conference on Partitioned Global Address Space Programming Models Pub Date : 2010-10-12 DOI: 10.1145/2020373.2020374
Steven Vormwald, Wei Wang, S. Carr, S. Seidel, Z. Wang
{"title":"Predicting remote reuse distance patterns in UPC applications","authors":"Steven Vormwald, Wei Wang, S. Carr, S. Seidel, Z. Wang","doi":"10.1145/2020373.2020374","DOIUrl":"https://doi.org/10.1145/2020373.2020374","url":null,"abstract":"Current work in high productivity parallel computing has focused attention on the class of partitioned global address space (PGAS) parallel programming languages because they promise to reduce the effort required to develop parallel application codes. An important aspect in achieving good performance in PGAS languages is effective handling of remote memory references. We extend a single-threaded reuse distance model to predict memory behavior for multi-threaded UPC applications. Our model handles changes in per-thread data size as well as changes in thread mapping due to problem size increases. Our results indicate the model provides good predictions of remote memory behavior by accurately predicting changes in remote memory reuse distance as a function of the problem size and the number of threads.","PeriodicalId":245693,"journal":{"name":"International Conference on Partitioned Global Address Space Programming Models","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131604101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Extensible PGAS semantics for C++ c++的可扩展PGAS语义
International Conference on Partitioned Global Address Space Programming Models Pub Date : 2010-10-12 DOI: 10.1145/2020373.2020385
N. Edmonds, Douglas P. Gregor, A. Lumsdaine
{"title":"Extensible PGAS semantics for C++","authors":"N. Edmonds, Douglas P. Gregor, A. Lumsdaine","doi":"10.1145/2020373.2020385","DOIUrl":"https://doi.org/10.1145/2020373.2020385","url":null,"abstract":"The Partitioned Global Address Space model combines the expression of data locality in SPMD applications, which is crucial to achieving good parallel performance, with the relative simplicity of the Distributed Shared Memory model. C++ currently lacks language support for PGAS semantics; however, C++ is an excellent host language for implementing Domain-Specific Embedded Languages (DSELs). Leveraging these capabilities of C++, we have implemented the Partitioned Global Property Map, a DSEL library supporting PGAS semantics, polymorphic partitioned global data structures, and a number of useful extensions. The Partitioned Global Property Map library utilizes template meta-programming to allow direct mapping at compile-time of high-level semantics to efficient underlying implementations. It combines flexible/extensible semantics, high performance, and portability across different low-level communication interfaces to allow PGAS programs to be expressed in C++.","PeriodicalId":245693,"journal":{"name":"International Conference on Partitioned Global Address Space Programming Models","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130601693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
X10-enabled MapReduce 了x10 MapReduce
International Conference on Partitioned Global Address Space Programming Models Pub Date : 2010-10-12 DOI: 10.1145/2020373.2020382
H. Dong, Shujia Zhou, D. Grove
{"title":"X10-enabled MapReduce","authors":"H. Dong, Shujia Zhou, D. Grove","doi":"10.1145/2020373.2020382","DOIUrl":"https://doi.org/10.1145/2020373.2020382","url":null,"abstract":"The MapReduce framework has become a popular and powerful tool to process large datasets in parallel over a cluster of computing nodes [1]. Currently, there are many flavors of implementations of MapReduce, among which the most popular is the Hadoop implementation in Java [5]. However, these implementations either rely on third-party file systems for across-computer-node communication or are difficult to implement with socket programming or communication libraries such as MPI. To address these challenges, we investigated utilizing the X10 language to implement MapReduce and tested it with the word-count use case. The key performance factor in implementing MapReduce is data moving across different computer nodes. Since X10 has built-in functions for across-node communication such as distributed arrays [2], a major challenge with MapReduce implementations is easily solved. We tested two main implementations: the first utilizes the HashMap data structure and the second a Rail with elements consisting of a string and integer pair. The performance of these two implementations are analyzed and discussed.","PeriodicalId":245693,"journal":{"name":"International Conference on Partitioned Global Address Space Programming Models","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127795765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
XcalableMP implementation and performance of NAS Parallel Benchmarks XcalableMP的实现和NAS并行基准的性能
International Conference on Partitioned Global Address Space Programming Models Pub Date : 2010-10-12 DOI: 10.1145/2020373.2020384
M. Nakao, Jinpil Lee, T. Boku, M. Sato
{"title":"XcalableMP implementation and performance of NAS Parallel Benchmarks","authors":"M. Nakao, Jinpil Lee, T. Boku, M. Sato","doi":"10.1145/2020373.2020384","DOIUrl":"https://doi.org/10.1145/2020373.2020384","url":null,"abstract":"XcalableMP is a parallel extension of existing languages, such as C and Fortran, that was proposed as a new programming model to facilitate program parallel applications for distributed memory systems. In order to investigate the performance of parallel programs written in XcalableMP, we have implemented NAS Parallel Benchmarks, specifically, the Embarrassingly Parallel (EP) benchmark, the Integer Sort (IS) benchmark, and the Conjugate Gradient (CG) benchmark, using XcalableMP. The results show that the performance of XcalableMP is comparable to that of MPI. In particular, the performances of IS with a histogram and CG with two-dimensional parallelization achieve almost the same performance. The results also demonstrate that XcalableMP allows a programmer to write efficient parallel applications at a lower programming cost.","PeriodicalId":245693,"journal":{"name":"International Conference on Partitioned Global Address Space Programming Models","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125775790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信