2011 IEEE International Conference on Cluster Computing最新文献

筛选
英文 中文
Analyzing the Performance Bottlenecks of the POWER7-IH Network POWER7-IH网络性能瓶颈分析
2011 IEEE International Conference on Cluster Computing Pub Date : 2011-09-26 DOI: 10.1109/CLUSTER.2011.35
D. Kerbyson, K. Barker
{"title":"Analyzing the Performance Bottlenecks of the POWER7-IH Network","authors":"D. Kerbyson, K. Barker","doi":"10.1109/CLUSTER.2011.35","DOIUrl":"https://doi.org/10.1109/CLUSTER.2011.35","url":null,"abstract":"In this work we provide an early performance analysis of the communication network in a small-scale POWER7-IH processing system from IBM. Using a set of communication micro-benchmarks we quantify the achievable bandwidth of the communication links available in the system that differ in their peak performance characteristics. We also identify the bottlenecks within the communication network and show that the bandwidth a single node can inject into the network is considerably less than the bandwidth available to the IBM hub chip, that acts as a NIC to the node as well as being an integral part of the P7-IH network. Using a communication pattern that is representative of activities in many scientific applications that have regular communication patterns, we show how the default task-to-core assignment on the P7-IH achieves sub-optimal performance in most cases. We also show that when using a diagonal-cyclic assignment, as developed in this work that takes into account the network topology as well as routing strategy, the communication performance can be improved by up to 75%. We expect even greater improvements in the communication performance on larger P7-IH systems.","PeriodicalId":200830,"journal":{"name":"2011 IEEE International Conference on Cluster Computing","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132032496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Parallel Greedy Genetic Algorithm for Job Scheduling in Cluster Enviornments 集群环境下作业调度的并行贪心遗传算法
2011 IEEE International Conference on Cluster Computing Pub Date : 2011-09-26 DOI: 10.1109/CLUSTER.2011.57
Gholamali Rahnavard, Jharrod Lafon, Hadi Sharifi
{"title":"Parallel Greedy Genetic Algorithm for Job Scheduling in Cluster Enviornments","authors":"Gholamali Rahnavard, Jharrod Lafon, Hadi Sharifi","doi":"10.1109/CLUSTER.2011.57","DOIUrl":"https://doi.org/10.1109/CLUSTER.2011.57","url":null,"abstract":"Recently, many scientific researchers and applications work on large amounts of data or use high performance computing resources. A high performance cluster is developed to handle massively parallel processes. To manage the resources for dynamic requests with optimal usage, we have to maximize the utilization rate of clusters. In this paper we provide a parallel genetic algorithm to schedule the jobs for different classes of clusters. The greedy approach is used to create an initial population for the genetic algorithm. We applied the master/slave method in parallelism to manage the schedulers and improve the performance of the main scheduler. Analyzing the complexity of the algorithm shows that it can be more efficient than similar algorithms.","PeriodicalId":200830,"journal":{"name":"2011 IEEE International Conference on Cluster Computing","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122062670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
An ISO-Energy-Efficient Approach to Scalable System Power-Performance Optimization 可扩展系统功率性能优化的iso -能效方法
2011 IEEE International Conference on Cluster Computing Pub Date : 2011-09-26 DOI: 10.1109/CLUSTER.2011.37
S. Song, M. Grove, K. Cameron
{"title":"An ISO-Energy-Efficient Approach to Scalable System Power-Performance Optimization","authors":"S. Song, M. Grove, K. Cameron","doi":"10.1109/CLUSTER.2011.37","DOIUrl":"https://doi.org/10.1109/CLUSTER.2011.37","url":null,"abstract":"The power consumption of a large scale system ultimately limits its performance. Consuming less energy while preserving performance leads to better system utilization at scale. The is o-energy-efficiency model was proposed as a metric and methodology for explaining power and performance efficiency on scalable systems. For use in practice, we need to determine what parameters should be modified to maintain a desired efficiency. Unfortunately, without extension, the iso-energy-efficiency model cannot be used for this purpose. In this paper we extend the iso-energy-efficiency model to identify appropriate efficiency values for workload and power scaling on clusters. We propose the use of \"correlation functions\" to quantitatively explain the isolated and interacting effects of these two parameters for three representative applications: LINPACK, row-oriented matrix multiplication, and 3D Fourier transform. We show quantitatively that the iso-energy-efficiency model with correlation functions is effective at maintaining efficiency as system size scales.","PeriodicalId":200830,"journal":{"name":"2011 IEEE International Conference on Cluster Computing","volume":"68 5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122550162","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Design and Implementation of Broadcast Algorithms for Extreme-Scale Systems 极端规模系统广播算法的设计与实现
2011 IEEE International Conference on Cluster Computing Pub Date : 2011-09-26 DOI: 10.1109/CLUSTER.2011.17
Pavel Shamis, R. Graham, Manjunath Gorentla Venkata, Joshua Ladd
{"title":"Design and Implementation of Broadcast Algorithms for Extreme-Scale Systems","authors":"Pavel Shamis, R. Graham, Manjunath Gorentla Venkata, Joshua Ladd","doi":"10.1109/CLUSTER.2011.17","DOIUrl":"https://doi.org/10.1109/CLUSTER.2011.17","url":null,"abstract":"The scalability and performance of collective communication operations limit the scalability and performance of many scientific applications. This paper presents two new blocking and nonblocking Broadcast algorithms for communicators with arbitrary communication topology, and studies their performance. These algorithms benefit from increased concurrency and a reduced memory footprint, making them suitable for use on large-scale systems. Measuring small, medium, and large data Broadcasts on a Cray-XT5, using 24,576 MPI processes, the Cheetah algorithms outperform the native MPI on that system by 51%, 69%, and 9%, respectively, at the same process count. These results demonstrate an algorithmic approach to the implementation of the important class of collective communications, which is high performing, scalable, and also uses resources in a scalable manner.","PeriodicalId":200830,"journal":{"name":"2011 IEEE International Conference on Cluster Computing","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130300532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
An Energy-Efficient Scheme for Cloud Resource Provisioning Based on CloudSim
2011 IEEE International Conference on Cluster Computing Pub Date : 2011-09-26 DOI: 10.1109/CLUSTER.2011.63
Yuxiang Shi, Xiaohong Jiang, Kejiang Ye
{"title":"An Energy-Efficient Scheme for Cloud Resource Provisioning Based on CloudSim","authors":"Yuxiang Shi, Xiaohong Jiang, Kejiang Ye","doi":"10.1109/CLUSTER.2011.63","DOIUrl":"https://doi.org/10.1109/CLUSTER.2011.63","url":null,"abstract":"Cloud computing has recently received considerable attention. With the fast development of cloud computing, the data center is becoming larger in scale and consumes more energy. There is an emergency need to develop efficient energy-saving methods to reduce the huge energy consumption in the cloud data center. In this paper, we achieve this goal by dynamically allocating resources based on utilization analysis and prediction. We use ``Linear Predicting Method\" (LPM) and ``Flat Period Reservation-Reduced Method\" (FPRRM) to get useful information from the resource utilization log, and make M/M/1 queuing theory predicting method have better response time and less energy-consuming. Experimental evaluation performed on CloudSim cloud simulator shows that the proposed methods can effectively reduce the violation rate and energy-consuming in the cloud.","PeriodicalId":200830,"journal":{"name":"2011 IEEE International Conference on Cluster Computing","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127789176","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 84
Optimizing Network I/O Virtualization with Efficient Interrupt Coalescing and Virtual Receive Side Scaling 利用高效中断合并和虚拟接收端扩展优化网络I/O虚拟化
2011 IEEE International Conference on Cluster Computing Pub Date : 2011-09-26 DOI: 10.1109/CLUSTER.2011.12
Yaozu Dong, Dongxiao Xu, Yang Zhang, Guangdeng Liao
{"title":"Optimizing Network I/O Virtualization with Efficient Interrupt Coalescing and Virtual Receive Side Scaling","authors":"Yaozu Dong, Dongxiao Xu, Yang Zhang, Guangdeng Liao","doi":"10.1109/CLUSTER.2011.12","DOIUrl":"https://doi.org/10.1109/CLUSTER.2011.12","url":null,"abstract":"Virtualization is a fundamental component in cloud computing because it provides numerous guest VM transparent services, such as live migration, high availability, rapid checkpoint, etc. However, I/O virtualization, particularly for network, is still suffering from significant performance degradation. In this paper, we analyze performance challenges in network I/O virtualization and observe that the conventional network I/O virtualization incurs excessive virtual interrupts to guest VMs, and the backend driver in the driver domain is not parallelized and cannot leverage underlying multi-core processors. Motivated by the above observations, we propose optimizations: efficient interrupt coalescing for network I/O virtualization and virtual receive side scaling to effectively leverage multi-core processors. We implemented those optimizations in Xen and did extensive performance evaluation. Our experimental results reveal that the proposed optimizations significantly improve network I/O virtualization performance and effectively tackle the performance challenges.","PeriodicalId":200830,"journal":{"name":"2011 IEEE International Conference on Cluster Computing","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134232427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 38
Multiphase LBM Distributed over Multiple GPUs 多相LBM分布在多个gpu上
2011 IEEE International Conference on Cluster Computing Pub Date : 2011-09-26 DOI: 10.1109/CLUSTER.2011.9
C. Rosales
{"title":"Multiphase LBM Distributed over Multiple GPUs","authors":"C. Rosales","doi":"10.1109/CLUSTER.2011.9","DOIUrl":"https://doi.org/10.1109/CLUSTER.2011.9","url":null,"abstract":"A parallel distributed CUDA implementation of a Lattice Boltzmann Method for multiphase flows with large density ratios is described in this paper. Validation runs studying the terminal velocity of a rising bubble under the effect of gravity show good agreement with the expected theoretical values. The code is benchmarked against the performance of a typical CPU implementation of the same algorithm on both AMD and Intel platforms, and a single GPU is observed to perform up to 10X faster than a quad-core CPU socket, a 40X speedup with respect to a single core. The code is shown to scale well when executed on multiple GPUs, which makes the port to CUDA valuable even when compared to parallel CPU implementations.","PeriodicalId":200830,"journal":{"name":"2011 IEEE International Conference on Cluster Computing","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114775490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
TDP-Shell: A Generic Framework to Improve Interoperability between Batch Queue Systems and Monitoring Tools TDP-Shell:改进批处理队列系统和监控工具之间互操作性的通用框架
2011 IEEE International Conference on Cluster Computing Pub Date : 2011-09-26 DOI: 10.1109/CLUSTER.2011.73
Vicente Ivars, M. A. Senar, E. Heymann
{"title":"TDP-Shell: A Generic Framework to Improve Interoperability between Batch Queue Systems and Monitoring Tools","authors":"Vicente Ivars, M. A. Senar, E. Heymann","doi":"10.1109/CLUSTER.2011.73","DOIUrl":"https://doi.org/10.1109/CLUSTER.2011.73","url":null,"abstract":"Nowadays distributed applications, including MPI implementations, are executed on computer clusters managed by a batch queue system. Users take advantage of monitoring tools to detect run-time problems on their applications running on those environments. But it is a challenge to use monitoring tools on a cluster controlled by a batch queue system. This is due to the fact that batch queue systems and monitoring tools do not coordinate the management of the resources they share, when executing a distributed application. We name this problem lack of interoperability and to solve it we have developed a framework called TDP-Shell. This framework supports different batch queue systems such as Condor and SGE, and different monitoring tools such as Paradyn, Gdb and Total view, without any changes on their source code. In this paper we describe how our basic design of TDP-Shell for sequential applications was re-designed to support the monitoring of MPI applications that are executed on a cluster controlled by a batch queue system.","PeriodicalId":200830,"journal":{"name":"2011 IEEE International Conference on Cluster Computing","volume":"199 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114370300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Model-Driven Simulation to Evaluate Performance Impact of Workload Features on Parallel Systems 模型驱动仿真评估工作负载特征对并行系统性能的影响
2011 IEEE International Conference on Cluster Computing Pub Date : 2011-09-26 DOI: 10.1109/CLUSTER.2011.27
T. Minh, L. Wolters
{"title":"Model-Driven Simulation to Evaluate Performance Impact of Workload Features on Parallel Systems","authors":"T. Minh, L. Wolters","doi":"10.1109/CLUSTER.2011.27","DOIUrl":"https://doi.org/10.1109/CLUSTER.2011.27","url":null,"abstract":"Parallel workloads in practice are far from being randomly distributed, instead they are highly repetitive because users tend to run the same applications over and over again. We refer to this phenomenon as temporal locality. In addition, the workloads exhibit a correlation between runtime and parallelism (i.e., number of processors) as is analysed in this paper. According to our best knowledge, there are very few studies on the impacts of these features on the performance of parallel systems. Since these impacts are not well known, researchers often evaluate scheduling algorithms with random workloads, which neglect the phenomenon of temporal locality and the correlation. This can result in an inaccurate scheduling evaluation for parallel systems, because our study shows that these two features can significantly affect scheduling performance. In our simulation-based experiments, an increase of the correlation can quickly degrade the parallel system performance and can change the result of comparing different scheduling policies. With respect to temporal locality, we indicate that this feature does not always seriously affect schedulers of parallel systems. Instead in particular situations, it can help to improve scheduling performance. Furthermore, we also discuss in this paper the necessity of using workloads with these features in scheduling evaluation as well as how to utilize the features to enhance the performance of schedulers.","PeriodicalId":200830,"journal":{"name":"2011 IEEE International Conference on Cluster Computing","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128100502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
PIDX: Efficient Parallel I/O for Multi-resolution Multi-dimensional Scientific Datasets PIDX:多分辨率多维科学数据集的高效并行I/O
2011 IEEE International Conference on Cluster Computing Pub Date : 2011-09-26 DOI: 10.1109/CLUSTER.2011.19
Sidharth Kumar, V. Vishwanath, P. Carns, B. Summa, G. Scorzelli, Valerio Pascucci, R. Ross, Jacqueline H. Chen, H. Kolla, R. Grout
{"title":"PIDX: Efficient Parallel I/O for Multi-resolution Multi-dimensional Scientific Datasets","authors":"Sidharth Kumar, V. Vishwanath, P. Carns, B. Summa, G. Scorzelli, Valerio Pascucci, R. Ross, Jacqueline H. Chen, H. Kolla, R. Grout","doi":"10.1109/CLUSTER.2011.19","DOIUrl":"https://doi.org/10.1109/CLUSTER.2011.19","url":null,"abstract":"The IDX data format provides efficient, cache oblivious, and progressive access to large-scale scientific datasets by storing the data in a hierarchical Z (HZ) order. Data stored in IDX format can be visualized in an interactive environment allowing for meaningful explorations with minimal resources. This technology enables real-time, interactive visualization and analysis of large datasets on a variety of systems ranging from desktops and laptop computers to portable devices such as iPhones/iPads and over the web. While the existing ViSUS API for writing IDX data is serial, there are obvious advantages of applying the IDX format to the output of large scale scientific simulations. We have therefore developed PIDX - a parallel API for writing data in an IDX format. With PIDX it is now possible to generate IDX datasets directly from large scale scientific simulations with the added advantage of real-time monitoring and visualization of the generated data. In this paper, we provide an overview of the IDX file format and how it is generated using PIDX. We then present a data model description and a novel aggregation strategy to enhance the scalability of the PIDX library. The S3D combustion application is used as an example to demonstrate the efficacy of PIDX for a real-world scientific simulation. S3D is used for fundamental studies of turbulent combustion requiring exceptionally high fidelity simulations. PIDX achieves up to 18 GiB/s I/O throughput at 8,192 processes for S3D to write data out in the IDX format. This allows for interactive analysis and visualization of S3D data, thus, enabling in situ analysis of S3D simulation.","PeriodicalId":200830,"journal":{"name":"2011 IEEE International Conference on Cluster Computing","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134061859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 34
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信