2015 IEEE International Conference on Cluster Computing最新文献_第2页

Real Time Visualization of Monitoring Data for Large Scale HPC Systems 大型高性能计算系统监控数据的实时可视化

2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.122

M. Showerman

引用次数: 3

Exploiting Spatial Information in Datasets to Enable Fault Tolerant Sparse Matrix Solvers 利用数据集中的空间信息实现容错稀疏矩阵求解

2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.102

Rob Hunt, Simon McIntosh-Smith

{"title":"Exploiting Spatial Information in Datasets to Enable Fault Tolerant Sparse Matrix Solvers","authors":"Rob Hunt, Simon McIntosh-Smith","doi":"10.1109/CLUSTER.2015.102","DOIUrl":"https://doi.org/10.1109/CLUSTER.2015.102","url":null,"abstract":"High-performance computing (HPC) systems continue to increase in size in the quest for ever higher performance. The resulting increased electronic component count, coupled with the decrease in feature sizes of the silicon manufacturing processes used to build these components, will result in future Exascale systems being more susceptible to soft errors caused by cosmic radiation than current HPC systems. Through the use of techniques such as hardware-based error-correcting codes (ECC) and checkpoint-restart, many of these faults can be mitigated, but at the cost of increased hardware overhead, run-time, and energy consumption that can be as much as 10 - 20%. For extreme scale systems, these overheads will represent megawatts of power consumption and millions of dollars of additional hardware cost, which could potentially be avoided with more sophisticated fault-tolerance techniques. In this paper we present a new software-based fault tolerance technique that can be applied to one of the most important classes of software in HPC: sparse matrix solvers. Our new technique enables us to exploit knowledge of the structure of sparse matrices in such a way as to improve the performance, energy efficiency and fault tolerance of the overall solution.","PeriodicalId":187042,"journal":{"name":"2015 IEEE International Conference on Cluster Computing","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117351811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Introducing and Exploiting Hierarchical Structural Information 层次结构信息的引入与开发

2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.133

D. Bonilla, C. W. Glass, J. Kuper, R. D. Groote

引用次数: 3

On the Execution of Computationally Intensive CPU-Based Libraries on Remote Accelerators for Increasing Performance: Early Experience with the OpenBLAS and FFTW Libraries 在远程加速器上执行基于cpu的计算密集型库以提高性能:使用OpenBLAS和FFTW库的早期经验

2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.111

S. Valero, F. Silla

{"title":"On the Execution of Computationally Intensive CPU-Based Libraries on Remote Accelerators for Increasing Performance: Early Experience with the OpenBLAS and FFTW Libraries","authors":"S. Valero, F. Silla","doi":"10.1109/CLUSTER.2015.111","DOIUrl":"https://doi.org/10.1109/CLUSTER.2015.111","url":null,"abstract":"Virtualization techniques have shown to report benefits to data centers and other computing facilities. In this regard, virtual machines not only allow reducing the size of the computing infrastructure while increasing overall resource utilization but virtualizing individual components of computers may also provide significant benefits. This is the case, for example, for the remote GPU virtualization technique, implemented in several frameworks during the last years. In this paper we present an initial implementation of a new middleware for the remote virtualization of another component of computers: the CPU itself. Our proposal uses remote accelerators to perform computations that were initially intended to be carried out in the local CPUs, doing so transparently to the application and without having to modify its source code. By making use of the OpenBLAS and FFTW libraries as case studies to show the performance gains of our proposal, we carry out a performance evaluation targeting several system configurations comprising Xeon processors as well as Ethernet and InfiniBand QDR, FDR, and EDR network adapters in addition to NVIDIA Tesla K40 GPUs. Results not only demonstrate that the new middleware is feasible, but they also show that mathematical libraries may experience a significant speed up, despite of having to move data forth and back to/from remote servers.","PeriodicalId":187042,"journal":{"name":"2015 IEEE International Conference on Cluster Computing","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116996842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

PLB-HeC: A Profile-Based Load-Balancing Algorithm for Heterogeneous CPU-GPU Clusters PLB-HeC:一种基于配置文件的异构CPU-GPU集群负载均衡算法

2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.24

Luis Sant'Ana, Daniel Cordeiro, R. Camargo

{"title":"PLB-HeC: A Profile-Based Load-Balancing Algorithm for Heterogeneous CPU-GPU Clusters","authors":"Luis Sant'Ana, Daniel Cordeiro, R. Camargo","doi":"10.1109/CLUSTER.2015.24","DOIUrl":"https://doi.org/10.1109/CLUSTER.2015.24","url":null,"abstract":"The use of GPU clusters for scientific applications in areas such as physics, chemistry and bioinformatics is becoming more widespread. These clusters frequently have different types of processing devices, such as CPUs and GPUs, which can themselves be heterogeneous. To use these devices in an efficient manner, it is crucial to find the right amount of work for each processor that balances the computational load among them. This problem is not only NP-hard on its essence, but also tricky due to the variety of architectures of those devices. We present PLB-HeC, a Profile-based Load-Balancing algorithm for Heterogeneous CPU-GPU Clusters that performs an online estimation of performance curve models for each GPU and CPU processor. Its main difference to existing algorithms is the generation of a non-linear system of equations representing the models and its solution using a interior point method, improving the accuracy of block distribution among processing units. We implemented the algorithm in the StarPU framework and compared its performance with existing load-balancing algorithms using applications from linear algebra, stock markets and bioinformatics. We show that it reduces the application execution times in almost all scenarios, when using heterogeneous clusters with two or more machine configurations.","PeriodicalId":187042,"journal":{"name":"2015 IEEE International Conference on Cluster Computing","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127127463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Empirical Comparison of Three Versioning Architectures 三种版本控制体系结构的实证比较

2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.69

H. Fujita, K. Iskra, P. Balaji, A. Chien

引用次数: 2

Evaluation of Parallel Communication Models in Nekbone, a Nek5000 Mini-Application Nek5000微型应用程序Nekbone中并行通信模型的评估

2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.131

I. Ivanov, Jing Gong, D. Akhmetova, I. Peng, S. Markidis, E. Laure, Rui Machado, M. Rahn, Valeria Bartsch, A. Hart, P. Fischer

{"title":"Evaluation of Parallel Communication Models in Nekbone, a Nek5000 Mini-Application","authors":"I. Ivanov, Jing Gong, D. Akhmetova, I. Peng, S. Markidis, E. Laure, Rui Machado, M. Rahn, Valeria Bartsch, A. Hart, P. Fischer","doi":"10.1109/CLUSTER.2015.131","DOIUrl":"https://doi.org/10.1109/CLUSTER.2015.131","url":null,"abstract":"Nekbone is a proxy application of Nek5000, a scalable Computational Fluid Dynamics (CFD) code used for modelling incompressible flows. The Nekbone mini-application is used by several international co-design centers to explore new concepts in computer science and to evaluate their performance. We present the design and implementation of a new communication kernel in the Nekbone mini-application with the goal of studying the performance of different parallel communication models. First, a new MPI blocking communication kernel has been developed to solve Nekbone problems in a three-dimensional Cartesian mesh and process topology. The new MPI implementation delivers a 13% performance improvement compared to the original implementation. The new MPI communication kernel consists of approximately 500 lines of code against the original 7,000 lines of code, allowing experimentation with new approaches in Nekbone parallel communication. Second, the MPI blocking communication in the new kernel was changed to the MPI non-blocking communication. Third, we developed a new Partitioned Global Address Space (PGAS) communication kernel, based on the GPI-2 library. This approach reduces the synchronization among neighbor processes and is on average 3% faster than the new MPI-based, non-blocking, approach. In our tests on 8,192 processes, the GPI-2 communication kernel is 3% faster than the new MPI non-blocking communication kernel. In addition, we have used the OpenMP in all the versions of the new communication kernel. Finally, we highlight the future steps for using the new communication kernel in the parent application Nek5000.","PeriodicalId":187042,"journal":{"name":"2015 IEEE International Conference on Cluster Computing","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130957759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

Evolution of Monitoring over the Lifetime of a High Performance Computing Cluster 高性能计算集群生命周期监控的演变

2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.123

A. DeConinck, K. Kelly

引用次数: 3

Exploiting GPUDirect RDMA in Designing High Performance OpenSHMEM for NVIDIA GPU Clusters 利用GPUDirect RDMA设计NVIDIA GPU集群的高性能OpenSHMEM

2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.21

Khaled Hamidouche, Akshay Venkatesh, A. Awan, H. Subramoni, Ching-Hsiang Chu, D. Panda

{"title":"Exploiting GPUDirect RDMA in Designing High Performance OpenSHMEM for NVIDIA GPU Clusters","authors":"Khaled Hamidouche, Akshay Venkatesh, A. Awan, H. Subramoni, Ching-Hsiang Chu, D. Panda","doi":"10.1109/CLUSTER.2015.21","DOIUrl":"https://doi.org/10.1109/CLUSTER.2015.21","url":null,"abstract":"GPUDirect RDMA (GDR) brings the high-performance communication capabilities of RDMA networks like InfiniBand (IB) to GPUs (referred to as \"Device\"). It enables IB network adapters to directly write/read data to/from GPU memory. Partitioned Global Address Space (PGAS) programming models, such as OpenSHMEM, provide an attractive approach for developing scientific applications with irregular communication characteristics by providing shared memory address space abstractions, along with one-sided communication semantics. However, current approaches and designs of OpenSHMEM on GPU clusters do not take advantage of the GDR features leading to inefficiencies and sub-optimal performance. In this paper, we analyze the performance of various OpenSHMEM operations with different inter-node and intra-node communication configurations (Host-to-Device, Device-to-Device, and Device-to-Host) on GPU based systems. We propose novel designs that ensure \"truly one-sided\" communication for the different inter-/intra-node configurations identified above while working around the hardware limitations. To the best of our knowledge, this is the first work that investigates GDR-aware designs for OpenSHMEM communication operations. Experimental evaluations indicate 2.5X and 7X improvement in point-point communication for intra-node and inter-node, respectively. The proposed framework achieves 2.2μs for an intra-node 8 byte put operation from Host-to-Device, and 3.13μs for an inter-node 8 byte put operation from GPU to remote GPU. With Stencil2D application kernel from SHOC benchmark suite, we observe a 19% reduction in execution time on 64 GPU nodes. Further, for GPULBM application, we are able to improve the performance of the evolution phase by 53% and 45% on 32 and 64 GPU nodes, respectively.","PeriodicalId":187042,"journal":{"name":"2015 IEEE International Conference on Cluster Computing","volume":"97 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126056270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 18

An Approach to Selecting Thread + Process Mixes for Hybrid MPI + OpenMP Applications 混合MPI + OpenMP应用程序的线程+进程混合选择方法

2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.64

Hormozd Gahvari, M. Schulz, U. Yang

引用次数: 1