{"title":"An Adaptive Data Prefetcher for High-Performance Processors","authors":"Yong Chen, Huaiyu Zhu, Xian-He Sun","doi":"10.1109/CCGRID.2010.61","DOIUrl":"https://doi.org/10.1109/CCGRID.2010.61","url":null,"abstract":"While computing speed continues increasing rapidly, data-access technology is lagging behind. Data-access delay, not the processor speed, becomes the leading performance bottleneck of high-end/high-performance computing. Prefetching is an effective solution to masking the gap between computing speed and data-access speed. Existing works of prefetching, however, are very conservative in general, due to the computing power consumption concern of the past. They suffer in effectiveness especially when applications' access pattern changes. In this study, we propose an Algorithm-level Feedback-controlled Adaptive (AFA) data prefetcher to address these issues. The AFA prefetcher is based on the Data-Access History Cache, a hardware structure that is specifically designed for data prefetching. It provides an algorithm-level adaptation and is capable of dynamically adapting to appropriate prefetching algorithms at runtime. We have conducted extensive simulation testing with Simple Scalar simulator to validate the design and to illustrate the performance gain. The simulation results show that AFA prefetcher is effective and achieves considerable IPC (Instructions Per Cycle) improvement in average.","PeriodicalId":444485,"journal":{"name":"2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing","volume":"47 8","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120851160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Realistic Integrated Model of Parallel System Workloads","authors":"T. Minh, L. Wolters, D. Epema","doi":"10.1109/CCGRID.2010.32","DOIUrl":"https://doi.org/10.1109/CCGRID.2010.32","url":null,"abstract":"Performance evaluation is a significant step in the study of scheduling algorithms in large-scale parallel systems ranging from supercomputers to clusters and grids. One of the key factors that have a strong effect on the evaluation results is the workloads (or traces) used in experiments. In practice, several researchers use unrealistic synthetic workloads in their scheduling evaluations because they lack models that can help generate realistic synthetic workloads. In this paper we propose a full model to capture the following characteristics of real parallel system workloads: 1) long range dependence in the job arrival process, 2) temporal and spatial burstiness, 3) bag-oftasks behaviour, and 4) correlation between the runtime and the number of processors. Validation of our model with real traces shows that our model not only captures the above characteristics but also fits the marginal distributions well. In addition, we also present an approach to quantify burstiness in a job arrival process (temporal) as well as burstiness in the load of a trace (spatial).","PeriodicalId":444485,"journal":{"name":"2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing","volume":"318 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116258425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Designing Accelerator-Based Distributed Systems for High Performance","authors":"M. M. Rafique, A. Butt, Dimitrios S. Nikolopoulos","doi":"10.1109/CCGRID.2010.109","DOIUrl":"https://doi.org/10.1109/CCGRID.2010.109","url":null,"abstract":"Multi-core processors with accelerators are becoming commodity components for high-performance computing at scale. While accelerator-based processors have been studied in some detail, the design and management of clusters based on these processors have not received the same focus. In this paper, we present an exploration of four design and resource management alternatives, which can be used on large-scale asymmetric clusters with accelerators. Moreover, we adapt the popular MapReduce programming model to our proposed configurations. We enhance MapReduce with new dynamic data streaming and workload scheduling capabilities, which enable application writers to use asymmetric accelerator-based clusters without being concerned with the capabilities of individual components. We present an evaluation of the presented designs in a physical setting and show that our designs can provide significant performance advantages. Compared to a standard static MapReduce design, we achieve 62.5%, 73.1%, and 82.2% performance improvement using accelerators with limited general-purpose resources, well-provisioned shared general-purpose resources, and well-provisioned dedicated general-purpose resources, respectively.","PeriodicalId":444485,"journal":{"name":"2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124500986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. Beran, W. Mach, R. Vigne, Juergen Mangler, E. Schikuta
{"title":"A Heuristic Query Optimization Approach for Heterogeneous Environments","authors":"P. Beran, W. Mach, R. Vigne, Juergen Mangler, E. Schikuta","doi":"10.1109/CCGRID.2010.65","DOIUrl":"https://doi.org/10.1109/CCGRID.2010.65","url":null,"abstract":"In a rapidly growing digital world the ability to discover, query and access data efficiently is one of the major challenges we are struggling today. Google has done a tremendous job by enabling casual users to easily and efficiently search for Web documents of interest. However, a comparable mechanism to query data stocks located in distributed databases is not available yet. Therefore our research focuses on the query optimization of distributed database queries, considering a huge variety on different infrastructures and algorithms. This paper introduces a novel heuristic query optimization approach based on a multi-layered blackboard mechanism. Moreover, a short evaluation scenario proofs our investigations that even small changes in the structure of a query execution tree (QET) can lead to significant performance improvements.","PeriodicalId":444485,"journal":{"name":"2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126399303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Integration of Heterogeneous and Non-dedicated Environments for R","authors":"Gonzalo Vera, R. Suppi","doi":"10.1109/CCGRID.2010.102","DOIUrl":"https://doi.org/10.1109/CCGRID.2010.102","url":null,"abstract":"Parallel computing is becoming essential for nowadays data analysis in several disciplines. In order to profit from parallel processing of experimental data, specialized skills, software tools and suitable computing resources are required. Desktop grids and volunteer-based systems have proved themselves as powerful options where distributed idle resources from heterogeneous computers are aggregated to build powerful met computers. Software solutions are required to automate and assist the process of transformation and adaptation of current and new applications to run in these environments. Finally, it is desirable, for the same tool, to provide an efficient solution to orchestrate the execution of these programs using a diversity of dynamic environments. In this paper we describe an implementation of an integrated solution for the R language which allows the transformation and execution of parallel loops in heterogeneous and non-dedicated environments. The results obtained allow us to prove the feasibility of our proposal. Furthermore, several issues that tools like this must consider to improve their performance when integrating heterogeneous systems are described.","PeriodicalId":444485,"journal":{"name":"2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing","volume":"119 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133423920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R. Graham, Steve Poole, Pavel Shamis, Gil Bloch, N. Bloch, H. Chapman, Michael Kagan, Ariel Shahar, Ishai Rabinovitz, G. Shainer
{"title":"ConnectX-2 InfiniBand Management Queues: First Investigation of the New Support for Network Offloaded Collective Operations","authors":"R. Graham, Steve Poole, Pavel Shamis, Gil Bloch, N. Bloch, H. Chapman, Michael Kagan, Ariel Shahar, Ishai Rabinovitz, G. Shainer","doi":"10.1109/CCGRID.2010.9","DOIUrl":"https://doi.org/10.1109/CCGRID.2010.9","url":null,"abstract":"This paper introduces the newly developed Infini- Band (IB) Management Queue capability, used by the Host Channel Adapter (HCA) to manage network task data flow dependancies, and progress the communications associated with such flows. These tasks include sends, receives, and the newly supported wait task, and are scheduled by the HCA based on a data dependency description provided by the user. This functionality is supported by the ConnectX-2 HCA, and provides the means for delegating collective communication management and progress to the HCA, also known as collective communication offload. This provides a means for overlapping collective communications managed by the HCA and computation on the Central Processing Unit (CPU), thus making it possible to reduce the impact of system noise on parallel applications using collective operations. This paper further describes how this new capability can be used to implement scalable Message Passing Interface (MPI) collective operations, describing the high level details of how this new capability is used to implement the MPI Barrier collective operation, focusing on the latency sensitive performance aspects of this new capability. This paper concludes with small scale bench- mark experiments comparing implementations of the barrier collective operation, using the new network offload capabilities, with established point-to-point based implementations of these same algorithms, which manage the data flow using the central processing unit. These early results demonstrate the promise this new capability provides to improve the scalability of high- performance applications using collective communications. The latency of the HCA based implementation of the barrier is similar to that of the best performing point-to-point based implementation managed by the central processing unit, starting to outperform these as the number of processes involved in the collective operation increases.","PeriodicalId":444485,"journal":{"name":"2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing","volume":"216 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124269307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Brandt, Frank Chen, Vincent De Sapio, A. Gentile, J. Mayo, P. Pébay, D. Roe, D. Thompson, M. Wong
{"title":"Using Cloud Constructs and Predictive Analysis to Enable Pre-Failure Process Migration in HPC Systems","authors":"J. Brandt, Frank Chen, Vincent De Sapio, A. Gentile, J. Mayo, P. Pébay, D. Roe, D. Thompson, M. Wong","doi":"10.1109/CCGRID.2010.31","DOIUrl":"https://doi.org/10.1109/CCGRID.2010.31","url":null,"abstract":"Accurate failure prediction in conjunction with efficient process migration facilities including some Cloud constructs can enable failure avoidance in large-scale high performance computing (HPC) platforms. In this work we demonstrate a prototype system that incorporates our probabilistic failure prediction system with virtualization mechanisms and techniques to provide a whole system approach to failure avoidance. This work utilizes a failure scenario based on a real-world HPC case study.","PeriodicalId":444485,"journal":{"name":"2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing","volume":"243 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114389015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Methodology for Efficient Execution of SPMD Applications on Multicore Environments","authors":"Ronal Muresano, Dolores Rexachs, E. Luque","doi":"10.1109/CCGRID.2010.67","DOIUrl":"https://doi.org/10.1109/CCGRID.2010.67","url":null,"abstract":"The need to efficiently execute applications in heterogeneous environments is a current challenge for parallel computing programmers. The communication heterogeneities found in multicore clusters need to be addressed to improve efficiency and speedup. This work presents a methodology developed for SPMD applications, which is centered on managing communication heterogeneities and improving system efficiency on multicore clusters. The methodology is composed of three phases: characterization, mapping strategy, and scheduling policy. We focus on SPMD applications which are designed through a message-passing library for communication, and selected according to their synchronicity and communications volume. The novel contribution of this methodology is it determines the approximate number of cores necessary to achieve a suitable solution with a good execution time, while the efficiency level is maintained over a threshold defined by users. Applying this methodology gave results showing a maximum improvement in efficiency of around 43% in the SPMD applications tested.","PeriodicalId":444485,"journal":{"name":"2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116872876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Energy Efficient Resource Management in Virtualized Cloud Data Centers","authors":"A. Beloglazov, R. Buyya","doi":"10.1109/CCGRID.2010.46","DOIUrl":"https://doi.org/10.1109/CCGRID.2010.46","url":null,"abstract":"Rapid growth of the demand for computational power by scientific, business and web-applications has led to the creation of large-scale data centers consuming enormous amounts of electrical power. We propose an energy efficient resource management system for virtualized Cloud data centers that reduces operational costs and provides required Quality of Service (QoS). Energy savings are achieved by continuous consolidation of VMs according to current utilization of resources, virtual network topologies established between VMs and thermal state of computing nodes. We present first results of simulation-driven evaluation of heuristics for dynamic reallocation of VMs using live migration according to current requirements for CPU performance. The results show that the proposed technique brings substantial energy savings, while ensuring reliable QoS. This justifies further investigation and development of the proposed resource management system.","PeriodicalId":444485,"journal":{"name":"2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117035348","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Sky Computing: When Multiple Clouds Become One","authors":"J. Fortes","doi":"10.1109/CCGRID.2010.136","DOIUrl":"https://doi.org/10.1109/CCGRID.2010.136","url":null,"abstract":"Summary form only given. The growing number of announced commercial and scientific clouds strongly suggests that in the near future these providers will be differentiated according to the types of their services, their cost, availability and quality. Users will be able to use these and other criteria to determine which clouds best suit their needs, a plausible scenario being the case when users need to aggregate capabilities provided by different clouds. In such scenarios it will be essential to provide virtual networking technologies that enable providers to support cross-cloud communication and users to deploy cross-cloud applications. This talk will describe one such technology, its salient features and remaining challenges. It will also put forward the idea of virtual clouds, i.e. providers of computing services overlaid on more than one cloud. A virtual cloud spans across multiple cloud providers and presents the view of a single logical cloud. Virtual clouds would enable high-level computing services to be provided by third parties who do not own physical resources, could be short or long-lived and highly dynamic. Enabling technologies, challenges and examples of sky computing will be presented.","PeriodicalId":444485,"journal":{"name":"2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117301182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}