2015 IEEE International Conference on Cluster Computing最新文献

筛选
英文 中文
Re-evaluating Network Onload vs. Offload for the Many-Core Era 重新评估多核时代的网络负载与卸载
2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.55
Matthew G. F. Dosanjh, Ryan E. Grant, P. Bridges, R. Brightwell
{"title":"Re-evaluating Network Onload vs. Offload for the Many-Core Era","authors":"Matthew G. F. Dosanjh, Ryan E. Grant, P. Bridges, R. Brightwell","doi":"10.1109/CLUSTER.2015.55","DOIUrl":"https://doi.org/10.1109/CLUSTER.2015.55","url":null,"abstract":"This paper explores the trade-offs between on-loaded versus offloaded network stack processing for systems with varying CPU frequencies. This study explores the differences of onload and offload using experiments run at different DVFS settings to change the frequency, while measuring performance and power. This allows for a quantitative comparison of the the performance and power and trade-offs between onload and offload cards, with a wide range of CPU performances. The results show that there is often a significant performance increase in using offloaded cards especially at lower CPU frequencies, with only a small increase in power usage. This study also uses MPI profiling to analyze why some applications see a larger benefit than others. This paper's contributions are an analytical, quantitative analysis of the trade-offs between onload and offload. While there has been debate to this question, this is the first, to the authors' knowledge, analytical evaluation of the performance difference. The range of frequencies analyzed give insight on how this MPI might perform on different architectures, such as the low frequency, many-core CPUs. Finally, the power measurements allow for the study to provide further depth in the analysis.","PeriodicalId":187042,"journal":{"name":"2015 IEEE International Conference on Cluster Computing","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133718914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Performance-to-Power Ratio Aware Virtual Machine (VM) Allocation in Energy-Efficient Clouds 节能云环境下基于性能功率比的虚拟机分配
2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.46
X. Ruan, Haiquan Chen
{"title":"Performance-to-Power Ratio Aware Virtual Machine (VM) Allocation in Energy-Efficient Clouds","authors":"X. Ruan, Haiquan Chen","doi":"10.1109/CLUSTER.2015.46","DOIUrl":"https://doi.org/10.1109/CLUSTER.2015.46","url":null,"abstract":"The last decade witnesses a dramatic advance of cloud computing research and techniques. One of the key faced challenges in this field is how to reduce the massive amount of energy consumption in cloud computing data centers. To address this issue, many power-aware virtual machine (VM) allocation and consolidation approaches are proposed to reduce energy consumption efficiently. However, most of those existing efficient cloud solutions save energy cost at a price of the significant performance degradation. In this paper, we present a novel VM allocation algorithm called \"PPRGear\", which leverages the Performance-to-Power ratios for various host types. By achieving the optimal balance between host utilization and energy consumption, PPRGear is able to guarantee that host computers run at the most power-efficient levels (i.e., the levels with highest Performance-to-Power ratios) so that the energy consumption can be tremendously reduced with little sacrifice of performance. Our extensive experiments with real world traces show that compared with three baseline energy-efficient VM allocation and selection algorithms, PPRGear is able to reduce the energy consumption up to 69.31% for various host computer types with fewer migration and shutdown times and little performance degradation for cloud computing data centers.","PeriodicalId":187042,"journal":{"name":"2015 IEEE International Conference on Cluster Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125839562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
Toward Auto-tuned Krylov Basis Computation for Different Sparse Matrix Formats and Interconnects on GPU Clusters GPU集群上不同稀疏矩阵格式和互连的自调Krylov基计算
2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.153
Langshi Chen, Serge G. Petition
{"title":"Toward Auto-tuned Krylov Basis Computation for Different Sparse Matrix Formats and Interconnects on GPU Clusters","authors":"Langshi Chen, Serge G. Petition","doi":"10.1109/CLUSTER.2015.153","DOIUrl":"https://doi.org/10.1109/CLUSTER.2015.153","url":null,"abstract":"Krylov subspace methods (KSMs) are widely used in solving large-scale sparse linear problems. The orthogonalization process in methods like GMRES would consume a majority of the time. Since modern manycore architecture based accelerators have provided great horsepowers for computations,communication overheads remain a bottleneck, especially in clusters with a great number of nodes. The HA-PACS/TCA of Tsukuba University is a CPU-GPU hybrid cluster equipped with different interconnects for communications among GPUs. We testa group of Krylov basis computation methods with different sparse matrices and interconnects on HA-PACS/TCA. Results show that an auto-tuning scheme is required to deal with various types of matrices.","PeriodicalId":187042,"journal":{"name":"2015 IEEE International Conference on Cluster Computing","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125934051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Distributed Modular Monitoring (DiMMon) Approach to Supercomputer Monitoring 分布式模块化监控(DiMMon)方法在超级计算机监控中的应用
2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.83
K. Stefanov, V. Voevodin
{"title":"Distributed Modular Monitoring (DiMMon) Approach to Supercomputer Monitoring","authors":"K. Stefanov, V. Voevodin","doi":"10.1109/CLUSTER.2015.83","DOIUrl":"https://doi.org/10.1109/CLUSTER.2015.83","url":null,"abstract":"In this work we propose a design for a new distributed modular monitoring system framework, which allows combining both monitoring tasks (supercomputer health and performance monitoring) in one monitoring system. Our approach allows different part of monitoring system to process only the data needed for the task assigned to this part. Another feature of our framework is the ability to calculate performance metrics on-the-fly, dynamically creating processing modules for every job or other objects of interest.","PeriodicalId":187042,"journal":{"name":"2015 IEEE International Conference on Cluster Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130131766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Towards Building Resilient Scientific Applications: Resilience Analysis on the Impact of Soft Error and Transient Error Tolerance with the CLAMR Hydrodynamics Mini-App 构建弹性科学应用:基于CLAMR流体力学小程序的软误差和瞬态误差容限影响的弹性分析
2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.35
Qiang Guan, Nathan Debardeleben, Brian Atkinson, R. Robey, William M. Jones
{"title":"Towards Building Resilient Scientific Applications: Resilience Analysis on the Impact of Soft Error and Transient Error Tolerance with the CLAMR Hydrodynamics Mini-App","authors":"Qiang Guan, Nathan Debardeleben, Brian Atkinson, R. Robey, William M. Jones","doi":"10.1109/CLUSTER.2015.35","DOIUrl":"https://doi.org/10.1109/CLUSTER.2015.35","url":null,"abstract":"In this paper, we present a resilience analysis of the impact of soft errors on CLAMR, a hydrodynamics miniapp for high performance computing (HPC). Leveraging the conservation of mass law, we design a fault detection mechanism and checkpoint/restart fault tolerance approach to enhance the resilience of CLAMR. Overall, our approach can detect up to 88.3% of faults that propagate into SDC or crashes with minimal (less than 1%) overhead for the optimal configuration. We show that CLAMR's fault-tolerance depends on when a fault is injected into the simulation and we also evaluate the frequency of detection and checkpointing on performance.","PeriodicalId":187042,"journal":{"name":"2015 IEEE International Conference on Cluster Computing","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130494178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Evaluating R-Based Big Data Analytic Frameworks 评估基于 R 的大数据分析框架
2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.86
Mei Liang, C. Trejo, Lavanya Muthu, Linh Ngo, André Luckow, A. Apon
{"title":"Evaluating R-Based Big Data Analytic Frameworks","authors":"Mei Liang, C. Trejo, Lavanya Muthu, Linh Ngo, André Luckow, A. Apon","doi":"10.1109/CLUSTER.2015.86","DOIUrl":"https://doi.org/10.1109/CLUSTER.2015.86","url":null,"abstract":"We study the two approaches, rHadoop and H2O, to intergate R, a popular statistical programming environment, into the Hadoop Big Data ecosystem. Using these approaches and the vanilla implementation of MapReduce to implement the solution to an analytic question for the on-time airline performance data set, we evaluate the differences in runtime performance and elaborate on the causes of these differences based on rHadoop and H2O's design principles.","PeriodicalId":187042,"journal":{"name":"2015 IEEE International Conference on Cluster Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129686914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
BPS: A Balanced Partial Stripe Write Scheme to Improve the Write Performance of RAID-6 BPS:提高RAID-6写性能的均衡分条写方案
2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.39
Congjin Du, Chentao Wu, Jie Li, M. Guo, Xubin He
{"title":"BPS: A Balanced Partial Stripe Write Scheme to Improve the Write Performance of RAID-6","authors":"Congjin Du, Chentao Wu, Jie Li, M. Guo, Xubin He","doi":"10.1109/CLUSTER.2015.39","DOIUrl":"https://doi.org/10.1109/CLUSTER.2015.39","url":null,"abstract":"Nowadays RAID is widely used due to its large capacity, high performance and high reliability. With the increasing requirement of reliability in storage systems and fast development of cloud computing, RAID-6, which can tolerate concurrent failures of any two disks, receives more attention than ever. However, the write performance of RAID-6 systems is a bottleneck to serve various applications. In the last two decades, many approaches are proposed to enhance the write performance of RAID-6, but they have several limitations, such as unbalanced I/O distribution and high I/O cost. To address this problem, in this paper, we propose a Balanced Partial Stripe (BPS) write scheme to improve the write performance of RAID-6 systems. The basic idea of BPS is reorganizing the distribution of write data blocks according to a global point of view on modified parities, and flushing these blocks to storage devices at once. Therefore, it can significantly reduce the total number of parity updates and balance the I/O workload. BPS has three main advantages: 1) BPS decreases the number of I/O operations and aggregate the fragmented I/Os, which improves the I/O performance, 2) BPS provides a balanced partial stripe write approach for RAID-6, 3) BPS can be applied with various erasure codes. To demonstrate the effectiveness of our scheme, we conduct simulations on DiskSim to evaluate different partial stripe write approaches. The results show that, compared to typical partial stripe write approaches, BPS reduces the average access time by up to 37.14%, and decreases the number of write operations by up to 26.24%.","PeriodicalId":187042,"journal":{"name":"2015 IEEE International Conference on Cluster Computing","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124006079","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Peer Comparison of XSEDE and NCAR Publication Data XSEDE和NCAR发表数据的同行比较
2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.98
G. Laszewski, Fugang Wang, Geoffrey Fox, David L. Hart, T. Furlani, R. L. Deleon, S. Gallo
{"title":"Peer Comparison of XSEDE and NCAR Publication Data","authors":"G. Laszewski, Fugang Wang, Geoffrey Fox, David L. Hart, T. Furlani, R. L. Deleon, S. Gallo","doi":"10.1109/CLUSTER.2015.98","DOIUrl":"https://doi.org/10.1109/CLUSTER.2015.98","url":null,"abstract":"We present a framework that compares the publication impact based on a comprehensive peer analysis of papers produced by scientists using XSEDE and NCAR resources. The analysis is introducing a percentile ranking based approach of citations of the XSEDE and NCAR papers compared to peer publications in the same journal that do not use these resources. This analysis is unique in that it evaluates the impact of the two facilities by comparing the reported publications from them to their peers from within the same journal issue. From this analysis, we can see that papers that utilize XSEDE and NCAR resources are cited statistically significantly more often. Hence we find that reported publications indicate that XSEDE and NCAR resources exert a strong positive impact on scientific research.","PeriodicalId":187042,"journal":{"name":"2015 IEEE International Conference on Cluster Computing","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115677725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Hybrid Communication with TCA and InfiniBand on a Parallel Programming Language XcalableACC for GPU Clusters 基于并行编程语言XcalableACC的GPU集群TCA和ib混合通信
2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.112
Tetsuya Odajima, T. Boku, T. Hanawa, H. Murai, M. Nakao, Akihiro Tabuchi, M. Sato
{"title":"Hybrid Communication with TCA and InfiniBand on a Parallel Programming Language XcalableACC for GPU Clusters","authors":"Tetsuya Odajima, T. Boku, T. Hanawa, H. Murai, M. Nakao, Akihiro Tabuchi, M. Sato","doi":"10.1109/CLUSTER.2015.112","DOIUrl":"https://doi.org/10.1109/CLUSTER.2015.112","url":null,"abstract":"For the execution of parallel HPC applications on GPU-ready clusters, high communication latency between GPUs over nodes will be a serious problem on strong scalability. To reduce the communication latency between GPUs, we proposed the Tightly Coupled Accelerator (TCA) architecture and developed the PEACH2 board as a proof-of-concept interconnection system for TCA. Although PEACH2 provides very low communication latency, there are some hardware limitations due to its implementation depending on PCIe technology, such as the practical number of nodes in a system which is 16 currently named sub-cluster. More number of nodes should be connected by conventional interconnections such as InfiniBand, and the entire network system is configured as a hybrid one with global conventional network and local high-speed network by PEACH2. For ease of user programmability, it is desirable to operate such a complicated communication system at the library or language level (which hides the system). In this paper, we develop a hybrid interconnection network system combining PEACH2 and InfiniBand, and implement it based on a high-level PGAS language for accelerated clusters named XcalableACC (XACC). A preliminary performance evaluation confirms that the hybrid network improves the performance based on the Himeno benchmark for stencil computation by up to 40%, relative to MVAPICH2 with GDR on InfiniBand. Additionally, Allgather collective communication with a hybrid network improves the performance by up to 50% for networks of 8 to 16 nodes. The combination of local communication, supported by the low latency of PEACH2 and global communication supported by the high bandwidth and scalability of InfiniBand, results in an improvement of overall performance.","PeriodicalId":187042,"journal":{"name":"2015 IEEE International Conference on Cluster Computing","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115444071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
High-Performance, Distributed Dictionary Encoding of RDF Datasets RDF数据集的高性能、分布式字典编码
2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.44
Alessandro Morari, Jesse Weaver, Oreste Villa, D. Haglin, Antonino Tumeo, Vito Giovanni Castellana, J. Feo
{"title":"High-Performance, Distributed Dictionary Encoding of RDF Datasets","authors":"Alessandro Morari, Jesse Weaver, Oreste Villa, D. Haglin, Antonino Tumeo, Vito Giovanni Castellana, J. Feo","doi":"10.1109/CLUSTER.2015.44","DOIUrl":"https://doi.org/10.1109/CLUSTER.2015.44","url":null,"abstract":"In this work we propose a novel approach for RDF (Resource Description Framework) dictionary encoding that employs a parallel RDF parser and a distributed dictionary data structure, exploiting RDF-specific optimizations. In contrast with previous solutions, this approach exploits the Partitioned Global Address Space (PGAS) programming model combined with active messages. We evaluate the performance of our dictionary encoder in our RDF database, GEMS (Graph Engine for Multithreaded Systems), and provide an empirical comparison against previous approaches. Our comparison shows that our dictionary encoder scales significantly better and achieves higher performance than the current state of the art, providing a key element for the realization of a more efficient RDF database.","PeriodicalId":187042,"journal":{"name":"2015 IEEE International Conference on Cluster Computing","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131412435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信