2012 IEEE International Conference on Cluster Computing最新文献

筛选
英文 中文
Adjustable Credit Scheduling for High Performance Network Virtualization 面向高性能网络虚拟化的可调信用调度
2012 IEEE International Conference on Cluster Computing Pub Date : 2012-09-24 DOI: 10.1109/CLUSTER.2012.27
Zhibo Chang, Jian Li, Ruhui Ma, Zhi-Jian Huang, Haibing Guan
{"title":"Adjustable Credit Scheduling for High Performance Network Virtualization","authors":"Zhibo Chang, Jian Li, Ruhui Ma, Zhi-Jian Huang, Haibing Guan","doi":"10.1109/CLUSTER.2012.27","DOIUrl":"https://doi.org/10.1109/CLUSTER.2012.27","url":null,"abstract":"Virtualization technology is now widely adopted in cloud computing to support heterogeneous and dynamic workload. The scheduler in a virtual machine monitor (VMM) plays an important role in allocating resources. However, the type of applications in virtual machines (VM) is unknown to the scheduler, and I/O-intensive and CPU-intensive applications are treated the same. This makes virtual systems unable to take full advantage of high performance networks such as 10-Gigabit Ethernet. In this paper, we review the SR-IOV networking solution and show by experiment that the current credit scheduler in Xen does not utilize high performance networks efficiently. For this reason, we propose a novel scheduling model with two optimizations to eliminate the bottleneck caused by scheduler. In this model, guest domains are divided into I/O-intensive domains and CPU-intensive domains according to their monitored behaviour. I/O-intensive domains can obtain extra credits that CPU-intensive domains are willing to share. Besides, the total available credits is adjusted agilely to accelerate the I/O responsiveness. Our experimental evaluation with benchmarks shows that the new scheduling model improves bandwidth even when the system's load is very high.","PeriodicalId":143579,"journal":{"name":"2012 IEEE International Conference on Cluster Computing","volume":"78 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132432885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
HAaaS: Towards Highly Available Distributed Systems HAaaS:迈向高可用分布式系统
2012 IEEE International Conference on Cluster Computing Pub Date : 2012-09-24 DOI: 10.1109/CLUSTER.2012.59
Yaoguang Wang, Weiming Lu, Bin-bin Yu, Baogang Wei
{"title":"HAaaS: Towards Highly Available Distributed Systems","authors":"Yaoguang Wang, Weiming Lu, Bin-bin Yu, Baogang Wei","doi":"10.1109/CLUSTER.2012.59","DOIUrl":"https://doi.org/10.1109/CLUSTER.2012.59","url":null,"abstract":"High availability is a valuable property in distributed systems. The master-slave model is used wildly in data management systems for high performance. However, many master-slave systems still have SPOF (Single Point of Failure) for the single master node. We exploit a generalized solution to meet several common use cases for different master-slave systems. The solution makes the high availability as a service (HAaaS), which uses a shared storage infrastructure to make the master stateless and provides an automatic fail over of high-availability service. We deploy the HAaaS in many master-slave subsystems in our unstructured data management system (UDMS) to make the UDMS highly available. The experiments demonstrate the feasibility and efficiency of our solution.","PeriodicalId":143579,"journal":{"name":"2012 IEEE International Conference on Cluster Computing","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133761598","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A New End-to-End Flow-Control Mechanism for High Performance Computing Clusters 一种新的高性能计算集群端到端流量控制机制
2012 IEEE International Conference on Cluster Computing Pub Date : 2012-09-24 DOI: 10.1109/CLUSTER.2012.15
Javier Prades, F. Silla, J. Duato, H. Fröning, M. Nüssle
{"title":"A New End-to-End Flow-Control Mechanism for High Performance Computing Clusters","authors":"Javier Prades, F. Silla, J. Duato, H. Fröning, M. Nüssle","doi":"10.1109/CLUSTER.2012.15","DOIUrl":"https://doi.org/10.1109/CLUSTER.2012.15","url":null,"abstract":"High Performance Computing usually leverages messaging libraries such as MPI or GASNet in order to exchange data among processes in large-scale clusters. Furthermore, these libraries make use of specialized low-level networking layers in order to retrieve as much performance as possible from hardware interconnects such as Infini Band or Myrinet, for example. EXTOLL is another emerging technology targeted for high performance clusters. These specialized low-level networking layers require some kind of flow control in order to prevent buffer overflows at the received side. In this paper we present a new flow control mechanism that is able to adapt the buffering resources used by a process according to the parallel application communication pattern and the varying activity among communicating peers. The tests carried out in a 64-node 1024-core EXTOLL cluster show that our new dynamic flow-control mechanism provides extraordinarily high buffer efficiency along with very low overhead, which is reduced between 8 and 10 times.","PeriodicalId":143579,"journal":{"name":"2012 IEEE International Conference on Cluster Computing","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133856276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Evaluating Power-Monitoring Capabilities on IBM Blue Gene/P and Blue Gene/Q 评估IBM Blue Gene/P和Blue Gene/Q上的电源监控功能
2012 IEEE International Conference on Cluster Computing Pub Date : 2012-09-24 DOI: 10.1109/CLUSTER.2012.62
Kazutomo Yoshii, K. Iskra, Rinku Gupta, P. Beckman, V. Vishwanath, Chenjie Yu, S. Coghlan
{"title":"Evaluating Power-Monitoring Capabilities on IBM Blue Gene/P and Blue Gene/Q","authors":"Kazutomo Yoshii, K. Iskra, Rinku Gupta, P. Beckman, V. Vishwanath, Chenjie Yu, S. Coghlan","doi":"10.1109/CLUSTER.2012.62","DOIUrl":"https://doi.org/10.1109/CLUSTER.2012.62","url":null,"abstract":"Power consumption is becoming a critical factor as we continue our quest toward exascale computing. Yet, actual power utilization of a complete system is an insufficiently studied research area. Estimating the power consumption of a large scale system is a nontrivial task because a large number of components are involved and because power requirements are affected by the (unpredictable) workloads. Clearly needed is a power-monitoring infrastructure that can provide timely and accurate feedback to system developers and application writers so that they can optimize the use of this precious resource. Many existing large-scale installations do feature power-monitoring sensors, however, those are part of environmental- and health monitoring sub systems and were not designed with application level power consumption measurements in mind. In this paper, we evaluate the existing power monitoring of IBM Blue Gene systems, with the goal of understanding what capabilities are available and how they fare with respect to spatial and temporal resolution, accuracy, latency, and other characteristics. We find that with a careful choice of dedicated micro benchmarks, we can obtain meaningful power consumption data even on Blue Gene/P, where the interval between available data points is measured in minutes. We next evaluate the monitoring subsystem on Blue Gene/Q, and are able to study the power characteristics of FPU and memory subsystems of Blue Gene/Q. We find the monitoring subsystem capable of providing second-scale resolution of power data conveniently separated between node components with seven seconds latency. This represents a significant improvement in power monitoring infrastructure, and hope future systems will enable real-time power measurement in order to better understand application behavior at a finer granularity.","PeriodicalId":143579,"journal":{"name":"2012 IEEE International Conference on Cluster Computing","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114348977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
Minimizing Network Contention in InfiniBand Clusters with a QoS-Aware Data-Staging Framework 基于qos感知的数据分级框架最小化ib集群中的网络争用
2012 IEEE International Conference on Cluster Computing Pub Date : 2012-09-24 DOI: 10.1109/CLUSTER.2012.90
R. Rajachandrasekar, Jai Jaswani, H. Subramoni, D. Panda
{"title":"Minimizing Network Contention in InfiniBand Clusters with a QoS-Aware Data-Staging Framework","authors":"R. Rajachandrasekar, Jai Jaswani, H. Subramoni, D. Panda","doi":"10.1109/CLUSTER.2012.90","DOIUrl":"https://doi.org/10.1109/CLUSTER.2012.90","url":null,"abstract":"The rapid growth of supercomputing systems, both in scale and complexity, has been accompanied by degradation in system efficiencies. The sheer abundance of resources including millions of cores, vast amounts of physical memory and high-bandwidth networks are heavily under-utilized. This happens when the resources are time-shared amongst parallel applications that are scheduled to run on a subset of compute nodes in an exclusive manner. Several space-sharing techniques that have been proposed in the literature allow parallel applications to be co-located on compute nodes and share resources with each other. Although this leads to better system efficiencies, it also causes contention for system resources. In this work, we specifically address the problem of network contention, caused due to the sharing of network resources by parallel applications and file systems simultaneously. We leverage the Quality-of-Service (QoS) capabilities of the widely used Infini Band interconnect to enhance our data-staging file system, making it QoS-aware. This is a user-level framework that is agnostic of the file system and MPI implementation. Using this file system, we demonstrate the isolation of file system traffic from MPI communication traffic, thereby reducing the network contention. Experimental results show that MPI point-to-point latency can be reduced by up to 320 microseconds, and the bandwidth improved by up to 674MB/s in the presence of contention with I/O traffic. Furthermore, we were able to reduce the runtime of the AWP-ODC MPI application by about 9.89% in the presence of network contention, and also reduce the time spent in communication by the NAS CG kernel by 23.46%.","PeriodicalId":143579,"journal":{"name":"2012 IEEE International Conference on Cluster Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116282742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Transactional Multi-row Access Guarantee in the Key-Value Store 键值存储中的事务性多行访问保证
2012 IEEE International Conference on Cluster Computing Pub Date : 2012-09-24 DOI: 10.1109/CLUSTER.2012.57
Yaoguang Wang, Weiming Lu, Baogang Wei
{"title":"Transactional Multi-row Access Guarantee in the Key-Value Store","authors":"Yaoguang Wang, Weiming Lu, Baogang Wei","doi":"10.1109/CLUSTER.2012.57","DOIUrl":"https://doi.org/10.1109/CLUSTER.2012.57","url":null,"abstract":"The emergence of Cloud Computing and Big Data drives the development of novel data stores named NoSQL. A mass of data stores are developed and the most are key-value stores, where the stores are partitioned with keys and a key can identify a row uniquely. However, the requirement for efficiency and scalability makes them only provide the single-row atomic access. But in the Big Data era, more and more applications built on the key-value stores need transactional functionality across multiple rows. So, it is natural to implement a multi-row transaction management for key-value stores. In this paper, we implement a transaction processing system (TrasPS) which guarantees the transactional multi-row access from the application client to the key-value store in our unstructured data management system (UDMS). We also provide fault tolerance and recovery for the transactions. The implementation and experiments in our UDMS show that TrasPS can provide scalable multi-row access functionality at a very low overhead.","PeriodicalId":143579,"journal":{"name":"2012 IEEE International Conference on Cluster Computing","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123725240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Enabling Fast, Noncontiguous GPU Data Movement in Hybrid MPI+GPU Environments 在混合MPI+GPU环境中实现快速,不连续的GPU数据移动
2012 IEEE International Conference on Cluster Computing Pub Date : 2012-09-24 DOI: 10.1109/CLUSTER.2012.72
John Jenkins, James Dinan, P. Balaji, N. Samatova, R. Thakur
{"title":"Enabling Fast, Noncontiguous GPU Data Movement in Hybrid MPI+GPU Environments","authors":"John Jenkins, James Dinan, P. Balaji, N. Samatova, R. Thakur","doi":"10.1109/CLUSTER.2012.72","DOIUrl":"https://doi.org/10.1109/CLUSTER.2012.72","url":null,"abstract":"Lack of efficient and transparent interaction with GPU data in hybrid MPI+GPU environments challenges GPU acceleration of large-scale scientific computations. A particular challenge is the transfer of noncontiguous data to and from GPU memory. MPI implementations currently do not provide an efficient means of utilizing data types for noncontiguous communication of data in GPU memory. To address this gap, we present an MPI data type-processing system capable of efficiently processing arbitrary data types directly on the GPU. We present a means for converting conventional data type representations into a GPU-amenable format. Fine-grained, element-level parallelism is then utilized by a GPU kernel to perform in-device packing and unpacking of noncontiguous elements. We demonstrate a several-fold performance improvement for noncontiguous column vectors, 3D array slices, and 4D array sub volumes over CUDA-based alternatives. Compared with optimized, layout-specific implementations, our approach incurs low overhead, while enabling the packing of data types that do not have a direct CUDA equivalent. These improvements are demonstrated to translate to significant improvements in end-to-end, GPU-to-GPU communication time. In addition, we identify and evaluate communication patterns that may cause resource contention with packing operations, providing a baseline for adaptively selecting data-processing strategies.","PeriodicalId":143579,"journal":{"name":"2012 IEEE International Conference on Cluster Computing","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128525362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 35
Mastiff: A MapReduce-based System for Time-Based Big Data Analytics Mastiff:基于mapreduce的基于时间的大数据分析系统
2012 IEEE International Conference on Cluster Computing Pub Date : 2012-09-24 DOI: 10.1109/CLUSTER.2012.10
Sijie Guo, Jin Xiong, Weiping Wang, Rubao Lee
{"title":"Mastiff: A MapReduce-based System for Time-Based Big Data Analytics","authors":"Sijie Guo, Jin Xiong, Weiping Wang, Rubao Lee","doi":"10.1109/CLUSTER.2012.10","DOIUrl":"https://doi.org/10.1109/CLUSTER.2012.10","url":null,"abstract":"Existing MapReduce-based warehousing systems are not specially optimized for time-based big data analysis applications. Such applications have two characteristics: 1) data are continuously generated and are required to be stored persistently for a long period of time, 2) applications usually process data in some time period so that typical queries use time-related predicates. Time-based big data analytics requires both high data loading speed and high query execution performance. However, existing systems including current MapReduce-based solutions do not solve this problem well because the two requirements are contradictory. We have implemented a MapReduce-based system, called Mastiff, which provides a solution to achieve both high data loading speed and high query performance. Mastiff exploits a systematic combination of a column group store structure and a lightweight helper structure. Furthermore, Mastiff uses an optimized table scan method and a column-based query execution engine to boost query performance. Based on extensive experiments results with diverse workloads, we will show that Mastiff can significantly outperform existing systems including Hive, HadoopDB, and GridSQL.","PeriodicalId":143579,"journal":{"name":"2012 IEEE International Conference on Cluster Computing","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129198827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
On the Effects of CPU Caches on MPI Point-to-Point Communications CPU缓存对MPI点对点通信的影响
2012 IEEE International Conference on Cluster Computing Pub Date : 2012-09-24 DOI: 10.1109/CLUSTER.2012.22
Simone Pellegrini, T. Hoefler, T. Fahringer
{"title":"On the Effects of CPU Caches on MPI Point-to-Point Communications","authors":"Simone Pellegrini, T. Hoefler, T. Fahringer","doi":"10.1109/CLUSTER.2012.22","DOIUrl":"https://doi.org/10.1109/CLUSTER.2012.22","url":null,"abstract":"Several researchers investigated the placing of communication calls in message-passing parallel codes. The current rule of thumb it to maximize communication/computation overlap with early binding. In this work, we demonstrate that this is not the only design constraint because CPU caches can have a significant impact on communications. We conduct an empirical study of the interaction between CPU caching and communications for several different communication scenarios. We use the gained insight to formulate a set of intuitive rules for communication call placement and show how our rules can be applied to practical codes. Our optimized codes show an improvement of up to 40% for a simple stencil code. Our work is a first step towards communication optimizations by moving communication calls. We expect that future communication-aware compilers will use our insights as a standard technique to move communication calls in order to optimize performance.","PeriodicalId":143579,"journal":{"name":"2012 IEEE International Conference on Cluster Computing","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121565823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Autotuning Stencil-Based Computations on GPUs gpu上基于模板的自动调优计算
2012 IEEE International Conference on Cluster Computing Pub Date : 2012-09-24 DOI: 10.1109/CLUSTER.2012.46
A. Mametjanov, Daniel Lowell, Ching-Chen Ma, B. Norris
{"title":"Autotuning Stencil-Based Computations on GPUs","authors":"A. Mametjanov, Daniel Lowell, Ching-Chen Ma, B. Norris","doi":"10.1109/CLUSTER.2012.46","DOIUrl":"https://doi.org/10.1109/CLUSTER.2012.46","url":null,"abstract":"Finite-difference, stencil-based discretization approaches are widely used in the solution of partial differential equations describing physical phenomena. Newton-Krylov iterative methods commonly used in stencil-based solutions generate matrices that exhibit diagonal sparsity patterns. To exploit these structures on modern GPUs, we extend the standard diagonal sparse matrix representation and define new matrix and vector data types in the PETSc parallel numerical toolkit. We create tunable CUDA implementations of the operations associated with these types after identifying a number of GPU-specific optimizations and tuning parameters for these operations. We discuss our implementation of GPU auto tuning capabilities in the Orio framework and present performance results for several kernels, comparing them with vendor-tuned library implementations.","PeriodicalId":143579,"journal":{"name":"2012 IEEE International Conference on Cluster Computing","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126368561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 48
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信