{"title":"OpenSHMEM Non-blocking Data Movement Operations with MVAPICH2-X: Early Experiences","authors":"Khaled Hamidouche, Jie Zhang, D. Panda, K. Tomko","doi":"10.1109/PAW.2016.7","DOIUrl":"https://doi.org/10.1109/PAW.2016.7","url":null,"abstract":"PGAS models with a lightweight synchronization and shared memory abstraction, are seen as a good alternative to the Message Passing model for irregular communication patterns. OpenSHMEM is a library based PGAS model. OpenSHMEM 1.3 introduced Non-Blocking data movement operations to provide better asynchronous progress and overlap. In this paper, we present our experiences in designing Non-Blocking Put and Get operations on InfiniBand systems. Using the MVAPICH2-X runtime, we present the alternative designs for intra-node and inter-node operations. We also present a set of new benchmarks to analyze the latency, message rate performance, and communication/computation overlap benefits. The performance evaluation shows 7X improvement in the message rate. Furthermore, using a 3D-Stencil based application kernel, we assess the benefits of OpenSHMEM Non-Blocking extensions. We show 50% and 28% improvement on 27 and 64 processes, respectively.","PeriodicalId":383847,"journal":{"name":"2016 PGAS Applications Workshop (PAW)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127556647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Application of PGAS Programming to Power Grid Simulation","authors":"B. Palmer","doi":"10.1109/PAW.2016.10","DOIUrl":"https://doi.org/10.1109/PAW.2016.10","url":null,"abstract":"This paper will describe the application of the PGAS Global Arrays (GA) library to power grid simulations. The GridPACK™ framework has been designed to enable power grid engineers to develop parallel simulations of the power grid by providing a set of templates and libraries that encapsulate most of the details of parallel programming in higher level abstractions. The communication portions of the framework are implemented using a combination of message-passing (MPI) and one-sided communication (GA). This paper will provide a brief overview of GA and describe in detail the implementation of collective hash tables, which are used in many power grid applications to match data with a previously distributed network.","PeriodicalId":383847,"journal":{"name":"2016 PGAS Applications Workshop (PAW)","volume":"279 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114107368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
H. Shan, Samuel Williams, Yili Zheng, Weiqun Zhang, Bei Wang, S. Ethier, Zhengji Zhao
{"title":"Experiences of Applying One-Sided Communication to Nearest-Neighbor Communication","authors":"H. Shan, Samuel Williams, Yili Zheng, Weiqun Zhang, Bei Wang, S. Ethier, Zhengji Zhao","doi":"10.1109/PAW.2016.8","DOIUrl":"https://doi.org/10.1109/PAW.2016.8","url":null,"abstract":"Nearest-neighbor communication is one of the most important communication patterns appearing in many scientific applications. In this paper, we discuss the results of applying UPC++, a library-based partitioned global address space (PGAS) programming extension to C++, to an adaptive mesh framework (BoxLib), and a full scientific application GTC-P, whose communications are dominated by the nearest-neighbor communication. The results on a Cray XC40 system show that compared with the highly-tuned MPI two-sided implementations, UPC++ improves the communication performance up to 60% and 90% for BoxLib and GTC-P, respectively. We also implement the nearest-neighbor communication using MPI one-sided messages. The performance comparison demonstrates that the MPI one-sided implementation can also improve the communication performance over the two-sided version but not so significantly as UPC++ does.","PeriodicalId":383847,"journal":{"name":"2016 PGAS Applications Workshop (PAW)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133537935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}