2015 IEEE International Conference on Cluster Computing最新文献_第9页

Large Scale Frequent Pattern Mining Using MPI One-Sided Model 基于MPI单侧模型的大规模频繁模式挖掘

2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.30

Abhinav Vishnu, Khushbu Agarwal

引用次数: 9

Exploring the Suitability of Remote GPGPU Virtualization for the OpenACC Programming Model Using rCUDA 利用rCUDA探讨远程GPGPU虚拟化对OpenACC编程模型的适用性

2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.23

Adrián Castelló, Antonio J. Peña, R. Mayo, P. Balaji, E. S. Quintana‐Ortí

{"title":"Exploring the Suitability of Remote GPGPU Virtualization for the OpenACC Programming Model Using rCUDA","authors":"Adrián Castelló, Antonio J. Peña, R. Mayo, P. Balaji, E. S. Quintana‐Ortí","doi":"10.1109/CLUSTER.2015.23","DOIUrl":"https://doi.org/10.1109/CLUSTER.2015.23","url":null,"abstract":"OpenACC is an application programming interface (API) that aims to unleash the power of heterogeneous systems composed of CPUs and accelerators such as graphic processing units (GPUs) or Intel Xeon Phi coprocessors. This directive-based programming model is intended to enable developers to accelerate their application's execution with much less effort. Coprocessors offer significant computing power but in many cases these devices remain largely underused because not all parts of applications match the accelerator architecture. Remote accelerator virtualization frameworks introduce a means to address this problem. In particular, the remote CUDA virtualization middleware rCUDA provides transparent remote access to any GPU installed in a cluster. Combining these two technologies, OpenACC and rCUDA, in a single scenario is naturally appealing. In this work we explore how the different OpenACC directives behave on top of a remote GPGPU virtualization technology in two different hardware configurations. Our experimental evaluation reveals favorable performance results when the two technologies are combined, showing low overhead and similar scaling factors when executing OpenACC-enabled directives.","PeriodicalId":187042,"journal":{"name":"2015 IEEE International Conference on Cluster Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129379015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

GO-Docker: A Batch Scheduling System with Docker Containers GO-Docker:一个带有Docker容器的批处理调度系统

2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.89

Olivier Sallou, Cyril Monjeaud

引用次数: 10

Optimizing Explicit Hydrodynamics for Power, Energy, and Performance 优化显式流体力学的动力，能量和性能

2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.12

E. León, I. Karlin, Ryan E. Grant

引用次数: 15

Optimizing I/O for Petascale Seismic Simulations on Unstructured Meshes 基于非结构化网格的千万亿次地震模拟I/O优化

2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.51

Sebastian Rettenberger, M. Bader

引用次数: 5

Scalable Relativistic High-Resolution Shock-Capturing for Heterogeneous Computing 面向异构计算的可扩展相对论高分辨率冲击捕获

2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.110

F. Glines, Matthew Anderson, D. Neilsen

引用次数: 1

Enabling Tractable Exploration of the Performance of Adaptive Mesh Refinement 实现自适应网格细化性能的可处理探索

2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.129

C. Vaughan, R. Barrett

{"title":"Enabling Tractable Exploration of the Performance of Adaptive Mesh Refinement","authors":"C. Vaughan, R. Barrett","doi":"10.1109/CLUSTER.2015.129","DOIUrl":"https://doi.org/10.1109/CLUSTER.2015.129","url":null,"abstract":"A broad range of physical phenomena in science and engineering can be explored using finite difference and volume based application codes. Incorporating Adaptive Mesh Refinement (AMR) into these codes focuses attention on the most critical parts of a simulation, enabling increased numerical accuracy of the solution while limiting memory consumption. However, adaptivity comes at the cost of increased runtime complexity, which is particularly challenging on emerging and expected future architectures. In order to explore the design space offered by new computing environments, we have developed a proxy application called miniAMR. MiniAMR exposes a range of the important issues that will significantly impact the performance potential of full application codes. In this paper, we describe miniAMR, demonstrate what is designed to represent in a full application code, and illustrate how it can be used to exploit future high performance computing architectures. To ensure an accurate understanding of what miniAMR is intended to represent, we compare it with CTH, a shock hydrodynamics code in heavy use throughout several computational science and engineering communities.","PeriodicalId":187042,"journal":{"name":"2015 IEEE International Conference on Cluster Computing","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126433867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

Dynamic Model-Driven Parallel I/O Performance Tuning 动态模型驱动并行I/O性能调优

2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.37

Babak Behzad, S. Byna, Stefan M. Wild, Prabhat, M. Snir

{"title":"Dynamic Model-Driven Parallel I/O Performance Tuning","authors":"Babak Behzad, S. Byna, Stefan M. Wild, Prabhat, M. Snir","doi":"10.1109/CLUSTER.2015.37","DOIUrl":"https://doi.org/10.1109/CLUSTER.2015.37","url":null,"abstract":"Parallel I/O performance depends highly on the interactions among multiple layers of the parallel I/O stack. The most common layers include high-level I/O libraries, MPI-IO middleware, and parallel file system. Each of these layers offers various tunable parameters to control intermediary data transfer points and the final data layout. Due to the interdependencies and the number of combinations of parameters, finding a good set of parameter values for a specific application's I/O pattern is challenging. Recent efforts, such as autotuning with genetic algorithms (GAs) and analytical models, have several limitations. For instance, analytical models fail to capture the dynamic nature of shared supercomputing systems and are application-specific. GA-based tuning requires running many time-consuming experiments for each input size. In this paper, we present a strategy to generate automatically an empirical model for a given application pattern. Using a set of real measurements from running an I/O kernel as training set, we generate a nonlinear regression model. We use this model to predict the top-20 tunable parameter values that give efficient I/O performance and rerun the I/O kernel to select the best set of parameter under the current conditions as tunable parameters for future runs of the same I/O kernel. Using this approach, we demonstrate 6X - 94X speedup over default I/O time for different I/O kernels running on multiple HPC systems. We also evaluate performance by identifying interdependencies among different sets of tunable parameters.","PeriodicalId":187042,"journal":{"name":"2015 IEEE International Conference on Cluster Computing","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114213226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

MPC: A Massively Parallel Compression Algorithm for Scientific Data MPC:科学数据的大规模并行压缩算法

2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.59

Annie Yang, Hari Mukka, Farbod Hesaaraki, Martin Burtscher

{"title":"MPC: A Massively Parallel Compression Algorithm for Scientific Data","authors":"Annie Yang, Hari Mukka, Farbod Hesaaraki, Martin Burtscher","doi":"10.1109/CLUSTER.2015.59","DOIUrl":"https://doi.org/10.1109/CLUSTER.2015.59","url":null,"abstract":"Due to their high peak performance and energy efficiency, massively parallel accelerators such as GPUs are quickly spreading in high-performance computing, where large amounts of floating-point data are processed, transferred, and stored. Such environments can greatly benefit from data compression if done sufficiently quickly. Unfortunately, most conventional compression algorithms are unsuitable for highly parallel execution. In fact, it is generally unknown how to design good compression algorithms for massively parallel systems. To remedy this situation, we study 138,240 lossless compression algorithms for single-and double-precision floating-point values that are built exclusively from easily parallelizable components. We analyze the best of these algorithms, explain why they compress well, and derive the Massively Parallel Compression (MPC) algorithm from them. This novel algorithm requires almost no internal state, achieves heretofore unreached compression ratios on several data sets, and roughly matches the best CPU-based algorithms in compression ratio while outperforming them by one to two orders of magnitude in throughput.","PeriodicalId":187042,"journal":{"name":"2015 IEEE International Conference on Cluster Computing","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115811548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 26

SideWalk: A Facility of Lightweight Out-of-Band Communications for Augmenting Distributed Data Processing Flows 人行道:用于增强分布式数据处理流的轻量级带外通信设施

2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.43

Yin Huai, Yuan Yuan, Rubao Lee, Xiaodong Zhang

{"title":"SideWalk: A Facility of Lightweight Out-of-Band Communications for Augmenting Distributed Data Processing Flows","authors":"Yin Huai, Yuan Yuan, Rubao Lee, Xiaodong Zhang","doi":"10.1109/CLUSTER.2015.43","DOIUrl":"https://doi.org/10.1109/CLUSTER.2015.43","url":null,"abstract":"The foundation of a data processing engine running on a large cluster is its programming model that defines data processing operations and data movements. A special kind of communication activities that are not normally defined in the programming model but are often used in ad hoc ways in system development, is called out-of-band communications. The existing ad hoc solutions of out-of-band communications are often hard to reuse, error-prone, and not free from unwanted side effects. To address these issues, we have designed and implemented a standalone facility of out-of-band communications called SideWalk. With this facility, users can add out-of-band communication operations into their distributed data flows through a set of reusable APIs. These APIs have well defined semantics and thus, users' chances of writing error-prone programs with SideWalk are minimized. To prevent users from introducing unwanted side effects while using SideWalk, we prototype SideWalk to efficiently handle lightweight out-of-band communications and we restrict communication patterns that can be conducted through SideWalk without affecting the applicability of SideWalk on typical use cases. Our experimental results show that execution times of distributed data processing flows in a Hadoop environment with out-of-band communications implemented with SideWalk are reduced up to 1.53 times compared with that of distributed data processing flows with out-of-band communications implemented with a representative ad hoc solution.","PeriodicalId":187042,"journal":{"name":"2015 IEEE International Conference on Cluster Computing","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127850355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0