2015 IEEE International Conference on Cluster Computing最新文献_第7页

Re-evaluating Network Onload vs. Offload for the Many-Core Era 重新评估多核时代的网络负载与卸载

2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.55

Matthew G. F. Dosanjh, Ryan E. Grant, P. Bridges, R. Brightwell

{"title":"Re-evaluating Network Onload vs. Offload for the Many-Core Era","authors":"Matthew G. F. Dosanjh, Ryan E. Grant, P. Bridges, R. Brightwell","doi":"10.1109/CLUSTER.2015.55","DOIUrl":"https://doi.org/10.1109/CLUSTER.2015.55","url":null,"abstract":"This paper explores the trade-offs between on-loaded versus offloaded network stack processing for systems with varying CPU frequencies. This study explores the differences of onload and offload using experiments run at different DVFS settings to change the frequency, while measuring performance and power. This allows for a quantitative comparison of the the performance and power and trade-offs between onload and offload cards, with a wide range of CPU performances. The results show that there is often a significant performance increase in using offloaded cards especially at lower CPU frequencies, with only a small increase in power usage. This study also uses MPI profiling to analyze why some applications see a larger benefit than others. This paper's contributions are an analytical, quantitative analysis of the trade-offs between onload and offload. While there has been debate to this question, this is the first, to the authors' knowledge, analytical evaluation of the performance difference. The range of frequencies analyzed give insight on how this MPI might perform on different architectures, such as the low frequency, many-core CPUs. Finally, the power measurements allow for the study to provide further depth in the analysis.","PeriodicalId":187042,"journal":{"name":"2015 IEEE International Conference on Cluster Computing","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133718914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 16

Performance-to-Power Ratio Aware Virtual Machine (VM) Allocation in Energy-Efficient Clouds 节能云环境下基于性能功率比的虚拟机分配

2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.46

X. Ruan, Haiquan Chen

{"title":"Performance-to-Power Ratio Aware Virtual Machine (VM) Allocation in Energy-Efficient Clouds","authors":"X. Ruan, Haiquan Chen","doi":"10.1109/CLUSTER.2015.46","DOIUrl":"https://doi.org/10.1109/CLUSTER.2015.46","url":null,"abstract":"The last decade witnesses a dramatic advance of cloud computing research and techniques. One of the key faced challenges in this field is how to reduce the massive amount of energy consumption in cloud computing data centers. To address this issue, many power-aware virtual machine (VM) allocation and consolidation approaches are proposed to reduce energy consumption efficiently. However, most of those existing efficient cloud solutions save energy cost at a price of the significant performance degradation. In this paper, we present a novel VM allocation algorithm called \"PPRGear\", which leverages the Performance-to-Power ratios for various host types. By achieving the optimal balance between host utilization and energy consumption, PPRGear is able to guarantee that host computers run at the most power-efficient levels (i.e., the levels with highest Performance-to-Power ratios) so that the energy consumption can be tremendously reduced with little sacrifice of performance. Our extensive experiments with real world traces show that compared with three baseline energy-efficient VM allocation and selection algorithms, PPRGear is able to reduce the energy consumption up to 69.31% for various host computer types with fewer migration and shutdown times and little performance degradation for cloud computing data centers.","PeriodicalId":187042,"journal":{"name":"2015 IEEE International Conference on Cluster Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125839562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 24

Toward Auto-tuned Krylov Basis Computation for Different Sparse Matrix Formats and Interconnects on GPU Clusters GPU集群上不同稀疏矩阵格式和互连的自调Krylov基计算

2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.153

Langshi Chen, Serge G. Petition

引用次数: 0

Distributed Modular Monitoring (DiMMon) Approach to Supercomputer Monitoring 分布式模块化监控(DiMMon)方法在超级计算机监控中的应用

2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.83

K. Stefanov, V. Voevodin

引用次数: 4

Towards Building Resilient Scientific Applications: Resilience Analysis on the Impact of Soft Error and Transient Error Tolerance with the CLAMR Hydrodynamics Mini-App 构建弹性科学应用:基于CLAMR流体力学小程序的软误差和瞬态误差容限影响的弹性分析

2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.35

Qiang Guan, Nathan Debardeleben, Brian Atkinson, R. Robey, William M. Jones

引用次数: 14

Evaluating R-Based Big Data Analytic Frameworks 评估基于 R 的大数据分析框架

2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.86

Mei Liang, C. Trejo, Lavanya Muthu, Linh Ngo, André Luckow, A. Apon

引用次数: 11

BPS: A Balanced Partial Stripe Write Scheme to Improve the Write Performance of RAID-6 BPS:提高RAID-6写性能的均衡分条写方案

2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.39

Congjin Du, Chentao Wu, Jie Li, M. Guo, Xubin He

{"title":"BPS: A Balanced Partial Stripe Write Scheme to Improve the Write Performance of RAID-6","authors":"Congjin Du, Chentao Wu, Jie Li, M. Guo, Xubin He","doi":"10.1109/CLUSTER.2015.39","DOIUrl":"https://doi.org/10.1109/CLUSTER.2015.39","url":null,"abstract":"Nowadays RAID is widely used due to its large capacity, high performance and high reliability. With the increasing requirement of reliability in storage systems and fast development of cloud computing, RAID-6, which can tolerate concurrent failures of any two disks, receives more attention than ever. However, the write performance of RAID-6 systems is a bottleneck to serve various applications. In the last two decades, many approaches are proposed to enhance the write performance of RAID-6, but they have several limitations, such as unbalanced I/O distribution and high I/O cost. To address this problem, in this paper, we propose a Balanced Partial Stripe (BPS) write scheme to improve the write performance of RAID-6 systems. The basic idea of BPS is reorganizing the distribution of write data blocks according to a global point of view on modified parities, and flushing these blocks to storage devices at once. Therefore, it can significantly reduce the total number of parity updates and balance the I/O workload. BPS has three main advantages: 1) BPS decreases the number of I/O operations and aggregate the fragmented I/Os, which improves the I/O performance, 2) BPS provides a balanced partial stripe write approach for RAID-6, 3) BPS can be applied with various erasure codes. To demonstrate the effectiveness of our scheme, we conduct simulations on DiskSim to evaluate different partial stripe write approaches. The results show that, compared to typical partial stripe write approaches, BPS reduces the average access time by up to 37.14%, and decreases the number of write operations by up to 26.24%.","PeriodicalId":187042,"journal":{"name":"2015 IEEE International Conference on Cluster Computing","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124006079","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Peer Comparison of XSEDE and NCAR Publication Data XSEDE和NCAR发表数据的同行比较

2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.98

G. Laszewski, Fugang Wang, Geoffrey Fox, David L. Hart, T. Furlani, R. L. Deleon, S. Gallo

引用次数: 9

Hybrid Communication with TCA and InfiniBand on a Parallel Programming Language XcalableACC for GPU Clusters 基于并行编程语言XcalableACC的GPU集群TCA和ib混合通信

2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.112

Tetsuya Odajima, T. Boku, T. Hanawa, H. Murai, M. Nakao, Akihiro Tabuchi, M. Sato

{"title":"Hybrid Communication with TCA and InfiniBand on a Parallel Programming Language XcalableACC for GPU Clusters","authors":"Tetsuya Odajima, T. Boku, T. Hanawa, H. Murai, M. Nakao, Akihiro Tabuchi, M. Sato","doi":"10.1109/CLUSTER.2015.112","DOIUrl":"https://doi.org/10.1109/CLUSTER.2015.112","url":null,"abstract":"For the execution of parallel HPC applications on GPU-ready clusters, high communication latency between GPUs over nodes will be a serious problem on strong scalability. To reduce the communication latency between GPUs, we proposed the Tightly Coupled Accelerator (TCA) architecture and developed the PEACH2 board as a proof-of-concept interconnection system for TCA. Although PEACH2 provides very low communication latency, there are some hardware limitations due to its implementation depending on PCIe technology, such as the practical number of nodes in a system which is 16 currently named sub-cluster. More number of nodes should be connected by conventional interconnections such as InfiniBand, and the entire network system is configured as a hybrid one with global conventional network and local high-speed network by PEACH2. For ease of user programmability, it is desirable to operate such a complicated communication system at the library or language level (which hides the system). In this paper, we develop a hybrid interconnection network system combining PEACH2 and InfiniBand, and implement it based on a high-level PGAS language for accelerated clusters named XcalableACC (XACC). A preliminary performance evaluation confirms that the hybrid network improves the performance based on the Himeno benchmark for stencil computation by up to 40%, relative to MVAPICH2 with GDR on InfiniBand. Additionally, Allgather collective communication with a hybrid network improves the performance by up to 50% for networks of 8 to 16 nodes. The combination of local communication, supported by the low latency of PEACH2 and global communication supported by the high bandwidth and scalability of InfiniBand, results in an improvement of overall performance.","PeriodicalId":187042,"journal":{"name":"2015 IEEE International Conference on Cluster Computing","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115444071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

High-Performance, Distributed Dictionary Encoding of RDF Datasets RDF数据集的高性能、分布式字典编码

2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.44

Alessandro Morari, Jesse Weaver, Oreste Villa, D. Haglin, Antonino Tumeo, Vito Giovanni Castellana, J. Feo

引用次数: 1