2010 IEEE International Conference on Cluster Computing最新文献

Energy-Aware Scheduling in Virtualized Datacenters 虚拟化数据中心的能源感知调度

2010 IEEE International Conference on Cluster Computing Pub Date : 2010-09-20 DOI: 10.1109/CLUSTER.2010.15

Íñigo Goiri, F. Julià, Ramon Nou, J. L. Berral, Jordi Guitart, J. Torres

{"title":"Energy-Aware Scheduling in Virtualized Datacenters","authors":"Íñigo Goiri, F. Julià, Ramon Nou, J. L. Berral, Jordi Guitart, J. Torres","doi":"10.1109/CLUSTER.2010.15","DOIUrl":"https://doi.org/10.1109/CLUSTER.2010.15","url":null,"abstract":"The reduction of energy consumption in large-scale datacenters is being accomplished through an extensive use of virtualization, which enables the consolidation of multiple workloads in a smaller number of machines. Nevertheless, virtualization also incurs some additional overheads (e.g. virtual machine creation and migration) that can influence what is the best consolidated configuration, and thus, they must be taken into account. In this paper, we present a dynamic job scheduling policy for power-aware resource allocation in a virtualized datacenter. Our policy tries to consolidate workloads from separate machines into a smaller number of nodes, while fulfilling the amount of hardware resources needed to preserve the quality of service of each job. This allows turning off the spare servers, thus reducing the overall datacenter power consumption. As a novelty, this policy incorporates all the virtualization overheads in the decision process. In addition, our policy is prepared to consider other important parameters for a datacenter, such as reliability or dynamic SLA enforcement, in a synergistic way with power consumption. The introduced policy is evaluated comparing it against common policies in a simulated environment that accurately models HPC jobs execution in a virtualized datacenter including power consumption modeling and obtains a power consumption reduction of 15% with respect to typical policies.","PeriodicalId":152171,"journal":{"name":"2010 IEEE International Conference on Cluster Computing","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115606013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 107

Asynchronous Algorithms in MapReduce MapReduce中的异步算法

2010 IEEE International Conference on Cluster Computing Pub Date : 2010-09-20 DOI: 10.1109/CLUSTER.2010.30

Karthik Kambatla, Naresh Rapolu, S. Jagannathan, A. Grama

{"title":"Asynchronous Algorithms in MapReduce","authors":"Karthik Kambatla, Naresh Rapolu, S. Jagannathan, A. Grama","doi":"10.1109/CLUSTER.2010.30","DOIUrl":"https://doi.org/10.1109/CLUSTER.2010.30","url":null,"abstract":"Asynchronous algorithms have been demonstrated to improve scalability of a variety of applications in parallel environments. Their distributed adaptations have received relatively less attention, particularly in the context of conventional execution environments and associated overheads. One such framework, MapReduce, has emerged as a commonly used programming framework for large-scale distributed environments. While the MapReduce programming model has proved to be effective for data-parallel applications, significant questions relating to its performance and application scope remain unresolved. The strict synchronization between map and reduce phases limits expression of asynchrony and hence, does not readily support asynchronous algorithms. This paper investigates the notion of partial synchronizations in iterative MapReduce applications to overcome global synchronization overheads. The proposed approach applies a locality-enhancing partition on the computation. Map tasks execute local computations with (relatively) frequent local synchronizations, with less frequent global synchronizations. This approach yields significant performance gains in distributed environments, even though their serial operation counts are higher. We demonstrate these performance gains on asynchronous algorithms for diverse applications, including pagerank, shortestpath, and kmeans. We make the following specific contributions in the paper(i) we motivate the need to extend MapReduce with constructs for asynchrony, (ii) we propose an API to facilitate partial synchronizations combined with eager scheduling and locality enhancing techniques, and (iii) demonstrate performance improvements from our proposed extensions through a variety of applications from different domains.","PeriodicalId":152171,"journal":{"name":"2010 IEEE International Conference on Cluster Computing","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115255985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 48

Efficient Parallel Subgraph Counting Using G-Tries 使用G-Tries高效并行子图计数

2010 IEEE International Conference on Cluster Computing Pub Date : 2010-09-20 DOI: 10.1109/CLUSTER.2010.27

P. Ribeiro, Fernando M A Silva, Luís M. B. Lopes

{"title":"Efficient Parallel Subgraph Counting Using G-Tries","authors":"P. Ribeiro, Fernando M A Silva, Luís M. B. Lopes","doi":"10.1109/CLUSTER.2010.27","DOIUrl":"https://doi.org/10.1109/CLUSTER.2010.27","url":null,"abstract":"Finding and counting the occurrences of a collection of subgraphs within another larger network is a computationally hard problem, closely related to graph isomorphism. The subgraph count is by itself a very powerful characterization of a network and it is crucial for other important network measurements. G-tries are a specialized data-structure designed to store and search for subgraphs. By taking advantage of subgraph common substructure, g-tries can provide considerable speedups over previously used methods. In this paper we present a parallel algorithm based precisely on g-tries that is able to efficiently find and count subgraphs. The algorithm relies on randomized receiver-initiated dynamic load balancing and is able to stop its computation at any given time, efficiently store its search position, divide what is left to compute in two halfs, and resume from where it left. We apply our algorithm to several representative real complex networks from various domains and examine its scalability. We obtain an almost linear speedup up to 128 processors, thus allowing us to reach previously unfeasible limits. We showcase the multidisciplinary potential of the algorithm by also applying it to network motif discovery.","PeriodicalId":152171,"journal":{"name":"2010 IEEE International Conference on Cluster Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129715175","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 30

CDRM: A Cost-Effective Dynamic Replication Management Scheme for Cloud Storage Cluster CDRM:云存储集群高性价比的动态复制管理方案

2010 IEEE International Conference on Cluster Computing Pub Date : 2010-09-20 DOI: 10.1109/CLUSTER.2010.24

Q. Wei, B. Veeravalli, Bozhao Gong, Lingfang Zeng, D. Feng

引用次数: 256

Replication-Based Highly Available Metadata Management for Cluster File Systems 基于复制的集群文件系统高可用元数据管理

2010 IEEE International Conference on Cluster Computing Pub Date : 2010-09-20 DOI: 10.1109/CLUSTER.2010.34

Zhuan Chen, Jin Xiong, Dan Meng

引用次数: 10

Virtualizing Modern High-Speed Interconnection Networks with Performance and Scalability 虚拟化现代高速互连网络，提高性能和可扩展性

2010 IEEE International Conference on Cluster Computing Pub Date : 2010-09-20 DOI: 10.1109/CLUSTER.2010.19

Bo Li, Zhigang Huo, P. Zhang, Dan Meng

{"title":"Virtualizing Modern High-Speed Interconnection Networks with Performance and Scalability","authors":"Bo Li, Zhigang Huo, P. Zhang, Dan Meng","doi":"10.1109/CLUSTER.2010.19","DOIUrl":"https://doi.org/10.1109/CLUSTER.2010.19","url":null,"abstract":"As one of the most important enabling technologies of cloud computing, virtualization brings to HPC good manageability, online system maintenance, performance isolation and fault isolation. Furthermore, previous study on VMM-bypass I/O that virtualizes OS-bypass networks (e.g. InfiniBand) relieved the worry of performance degradation coming along with virtualization. In this paper, we address the scalability challenges imposed upon OS-bypass networks under virtualized environments. The eXtended Reliable Connection (XRC) transport, proposed in modern high-speed interconnection networks to address the scalability problem in large scale applications, would not work in virtualized environments. To solve the problem, we propose VM-proof XRC design to eliminate the scalability gap between virtualized and native environments. Prototype evaluation shows that the virtualization of modern high-speed interconnection networks could get the same raw performance and scalability as in native non-virtualized environment with our VM-proof XRC design. The connection memory scalability shows a potential of 16 times improvement on virtualized clusters composed of 16-core nodes.","PeriodicalId":152171,"journal":{"name":"2010 IEEE International Conference on Cluster Computing","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130681357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Cluster versus GPU implementation of an Orthogonal Target Detection Algorithm for Remotely Sensed Hyperspectral Images 遥感高光谱图像正交目标检测算法的聚类与GPU实现

2010 IEEE International Conference on Cluster Computing Pub Date : 2010-09-20 DOI: 10.1109/CLUSTER.2010.28

Abel Paz, A. Plaza

{"title":"Cluster versus GPU implementation of an Orthogonal Target Detection Algorithm for Remotely Sensed Hyperspectral Images","authors":"Abel Paz, A. Plaza","doi":"10.1109/CLUSTER.2010.28","DOIUrl":"https://doi.org/10.1109/CLUSTER.2010.28","url":null,"abstract":"Remotely sensed hyperspectral imaging instruments provide high-dimensional data containing rich information in both the spatial and the spectral domain. In many surveillance applications, detecting objects (targets) is a very important task. In particular, algorithms for detecting (moving or static) targets, or targets that could expand their size (such as propagating fires) often require timely responses for swift decisions that depend upon high computing performance of algorithm analysis. In this paper, we develop parallel versions of a target detection algorithm based on orthogonal subspace projections. The parallel implementations are tested in two types of parallel computing architectures: a massively parallel cluster of computers called Thunderhead and available at NASA’s Goddard Space Flight Center in Maryland, and a commodity graphics processing unit (GPU) of NVidia GeForce GTX 275 type. While the cluster-based implementation reveals itself as appealing for information extraction from remote sensing data already transmitted to Earth, the GPU implementation allows us to perform near real-time anomaly detection in hyperspectral scenes, with speedups over 50x with regards to a highly optimized serial version. The proposed parallel algorithms are quantitatively evaluated using hyperspectral data collected by the NASA’s Airborne Visible Infra-Red Imaging Spectrometer (AVIRIS) system over the World Trade Center (WTC) in New York, five days after the attacks that collapsed the two main towers in the WTC complex.","PeriodicalId":152171,"journal":{"name":"2010 IEEE International Conference on Cluster Computing","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127680753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 21

Analysis of Tasks Reallocation in a Dedicated Grid Environment 专用网格环境下的任务再分配分析

2010 IEEE International Conference on Cluster Computing Pub Date : 2010-09-20 DOI: 10.1109/CLUSTER.2010.39

Y. Caniou, G. Charrier, F. Desprez

引用次数: 15

Optimization Techniques at the I/O Forwarding Layer I/O转发层的优化技术

2010 IEEE International Conference on Cluster Computing Pub Date : 2010-09-20 DOI: 10.1109/CLUSTER.2010.36

Kazuki Ohta, D. Kimpe, Jason Cope, K. Iskra, R. Ross, Y. Ishikawa

{"title":"Optimization Techniques at the I/O Forwarding Layer","authors":"Kazuki Ohta, D. Kimpe, Jason Cope, K. Iskra, R. Ross, Y. Ishikawa","doi":"10.1109/CLUSTER.2010.36","DOIUrl":"https://doi.org/10.1109/CLUSTER.2010.36","url":null,"abstract":"I/O is the critical bottleneck for data-intensive scientific applications on HPC systems and leadership-class machines. Applications running on these systems may encounter bottlenecks because the I/O systems cannot handle the overwhelming intensity and volume of I/O requests. Applications and systems use I/O forwarding to aggregate and delegate I/O requests to storage systems. In this paper, we present two optimization techniques at the I/O forwarding layer to further reduce I/O bottlenecks on leadership-class computing systems. The first optimization pipelines data transfers so that I/O requests overlap at the network and file system layer. The second optimization merges I/O requests and schedules I/O request delegation to the back-end parallel file systems. We implemented these optimizations in the I/O Forwarding Scalability Layer and them on the T2K Open Supercomputer at the University of Tokyo and the Surveyor Blue Gene/P system at the Argonne Leadership Computing Facility. On both systems, the optimizations improved application I/O throughput, but highlighted additional areas of I/O contention at the I/O forwarding layer that we plan to address.","PeriodicalId":152171,"journal":{"name":"2010 IEEE International Conference on Cluster Computing","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114595505","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 47

Multiplexing Endpoints of HCA for Scaling MPI Applications: Design and Performance Evaluation with uDAPL 用于扩展MPI应用的HCA复用端点:用uDAPL进行设计和性能评估

2010 IEEE International Conference on Cluster Computing Pub Date : 2010-09-20 DOI: 10.1109/CLUSTER.2010.22

Jasjit Singh, Yogeshwar Sonawane

{"title":"Multiplexing Endpoints of HCA for Scaling MPI Applications: Design and Performance Evaluation with uDAPL","authors":"Jasjit Singh, Yogeshwar Sonawane","doi":"10.1109/CLUSTER.2010.22","DOIUrl":"https://doi.org/10.1109/CLUSTER.2010.22","url":null,"abstract":"With an ever increasing demand for computing power, number of nodes to be deployed in a cluster based supercomputer is increasing. Limited hardware resources such as Endpoints (equivalent to Queue Pairs) on a Host Channel Adapter (HCA) of a high speed interconnect limit the scalability of a parallel application based on MPI that sets up reliable connections between every process pair using endpoints, prior to communication. In this paper, we propose a novel approach of multiplexing hardware endpoints (hweps) to extend scalability. (a) We discuss critical design issues with the multiplexing technique that differentiates a hwep from its software counterpart (swep) and enables sharing of hwep by multiple sweps. (b) We introduce the concept of Virtual Identifier (VID) which ensures that the connection between hardware endpoints is strictly one-to-one. (c) We also present static mapping scheme that offsets the overheads incurred due to multiplexing. User Direct Access Programming Library (uDAPL) defines a single set of APIs for all RDMA capable transports. We have incorporated the proposed multiplexing technique as a part of uDAPL implementation. Using this approach, we are able to scale MPI applications beyond the limit imposed by HCA and with no visible performance degradation.","PeriodicalId":152171,"journal":{"name":"2010 IEEE International Conference on Cluster Computing","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116232936","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2