Ramesh Viswanath, M. Ahamad, Karsten Schwan Georgia
{"title":"Harnessing Shared Wide-area Clusters for Dynamic High End Services","authors":"Ramesh Viswanath, M. Ahamad, Karsten Schwan Georgia","doi":"10.1109/CLUSTR.2005.347073","DOIUrl":"https://doi.org/10.1109/CLUSTR.2005.347073","url":null,"abstract":"Current trends in distributed computing have been moving towards the use of wide-area clusters that are managed by different entities. In this paper, we introduce middleware-level support to facilitate computational resource sharing with service guarantees using non-dedicated server systems in wide-area clusters. The aim is to ensure that sets of computational tasks submitted to such high end systems are completed reliably and in a timely fashion. Our approach develops methods that enhance basic job scheduling with information about the execution history and trust values for the computational nodes to which jobs are assigned. In essence, job scheduling is enriched with trust models constructed and maintained at runtime, and scheduling decisions are based on metrics that capture trust in remote server systems. An implementation of the approach is evaluated on Planetlab, with initial results demonstrating good success rates in completing jobs within their specific service level agreements, including under conditions of high system loads. Additional results are attained with a variant of the scheduling algorithm that uses redundancy to further improve the likelihood of meeting end user SLAs. A representative application considered in this paper is remote data visualization, where substantial computation must be applied to data before displaying it to end users. SLAs capture desired end-to-end delay, and distributed server or cluster systems are used to perform the required computations in a timely manner","PeriodicalId":255312,"journal":{"name":"2005 IEEE International Conference on Cluster Computing","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125931486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Online Critical Path Profiling for Parallel Applications","authors":"Wenbin Zhu, P. Bridges, A. Maccabe","doi":"10.1109/CLUSTR.2005.347048","DOIUrl":"https://doi.org/10.1109/CLUSTR.2005.347048","url":null,"abstract":"Online monitoring of parallel applications is increasingly important for techniques such as load balancing, protocol adaptation, and online anomaly detection. Unfortunately, existing online monitoring techniques only monitor individual hosts in a distributed-memory parallel application. In this paper, we show how a new monitoring technique, message-centric monitoring, can be used for online monitoring of the complete critical path in distributed-memory parallel applications. Results from an MPI-based message-centric monitoring prototype called IMPuLSE show that it has less than 3% runtime overhead, accurately measures whole-system performance as the application runs, and captures data that can be used by nodes to detect unusual system behaviors at runtime","PeriodicalId":255312,"journal":{"name":"2005 IEEE International Conference on Cluster Computing","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128627413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"RNIC-PI: The last step in standardizing RDMA","authors":"Ramesh VelurEunni","doi":"10.1109/CLUSTR.2005.347030","DOIUrl":"https://doi.org/10.1109/CLUSTR.2005.347030","url":null,"abstract":"The hardware-software interaction for the industry-standard remote direct memory access (RDMA) devices have only been defined as an abstract set of operations. While the abstract definition has allowed vendors to build RDMA adapters, the industry still lacks a generic software interface definition that hardware and system vendors can code to. This paper presents the industry-standard RDMA NIC Programming Interface (RNIC-PI) and its flexible architecture, the role it plays in the industry and the challenges encountered in reaching agreement on a common set of semantics that is acceptable to a variety of operating systems and RDMA NIC vendors. A vehicle to translate the interface into reality on Linux is described in the end along with some suggestions on future course of action","PeriodicalId":255312,"journal":{"name":"2005 IEEE International Conference on Cluster Computing","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125457072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Parallel Out-of-Core Matlab for Extreme Virtual Memory","authors":"Hahn Kim, J. Kepner, C. Kahn","doi":"10.1109/CLUSTR.2005.347016","DOIUrl":"https://doi.org/10.1109/CLUSTR.2005.347016","url":null,"abstract":"Summary form only given. Large data sets that cannot fit in memory can be addressed with out-of-core methods, which use memory as a \"window \" to view a section of the data stored on disk at a time. The parallel Matlab for eXtreme virtual memory (pMatlab XVM) library adds out-of-core extensions to the parallel Matlab (pMatlab) library. We have applied pMatlab XVM to the DARPA high productivity computing systems' HPCchallenge FFT benchmark. The benchmark was run using several different implementations: C+MPI, pMatlab, pMatlab hand coded for out-of-core and pMatlab XVM. These experiments found 1) the performance of the C+MPI and pMatlab versions were comparable; 2) the out-of-core versions deliver 80% of the performance of the in-core versions; 3) the out-of-core versions were able to perform a 1 terabyte (64 billion point) FFT and 4) the pMatlab XVM program was smaller, easier to implement and verify, and more efficient than its hand coded equivalent. We are transitioning this technology to several DoD signal processing applications and plan to apply pMatlab XVM to the full HPCchallenge benchmark suite. Using next generation hardware, problems sizes a factor of 100 to 1000 times larger should be feasible","PeriodicalId":255312,"journal":{"name":"2005 IEEE International Conference on Cluster Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130749406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yingyin Jiang, Dan Meng, Yi Liang, Danjun Liu, Jianfeng Zhan
{"title":"Adaptive Management of a Utility Computing","authors":"Yingyin Jiang, Dan Meng, Yi Liang, Danjun Liu, Jianfeng Zhan","doi":"10.1109/CLUSTR.2005.347012","DOIUrl":"https://doi.org/10.1109/CLUSTR.2005.347012","url":null,"abstract":"The complexity of the high performance Web-based application challenges the traditional approaches, which fail to guarantee the reliability and real-time performance required. In this paper, we have studied the adaptive mechanisms for managing such applications and explained them based on a prototype of an adaptive application management system (AMUS) in cluster. AMUS is composed of the SLA event-driven global resource manager, the server resource manager and the self-adapting application systems based on feedback control theory. The adoption of feedback control theory supports the application resource control in the case of the resource contention and the guarantee of the QoS performance in the changing environment","PeriodicalId":255312,"journal":{"name":"2005 IEEE International Conference on Cluster Computing","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132487531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Search-based Job Scheduling for Parallel Computer Workloads","authors":"S. Vasupongayya, S. Chiang, Barton C. Massey","doi":"10.1109/CLUSTR.2005.347037","DOIUrl":"https://doi.org/10.1109/CLUSTR.2005.347037","url":null,"abstract":"To balance performance goals and allow administrators to declaratively specify high-level performance goals, we apply complete search algorithms to design on-line job scheduling policies for workloads that run on parallel computer systems. We formulate a hierarchical two-level objective that contains two goals commonly placed on parallel computer systems: (1) minimizing the total excessive wait; (2) minimizing the average slowdown. Ten monthly workloads that ran on a Linux cluster (IA-64) from NCSA are used in our simulation of policies. A wide range of measures are used for performance evaluation, including the average slowdown, average wait, maximum wait, and new measures based on excessive wait. For the workloads studied, our results show that the best search-based scheduling policy (i.e., DDS/lxf/dynB) reported here simultaneously beats both FCFS-backfill and LXF-backfill, each roughly providing a lower bound on maximum wait and the average slowdown, respectively, among backfill policies","PeriodicalId":255312,"journal":{"name":"2005 IEEE International Conference on Cluster Computing","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130055385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The ELIHE High-Performance Cluster","authors":"Violeta Holmes, Terence McDonough","doi":"10.1109/CLUSTR.2005.347091","DOIUrl":"https://doi.org/10.1109/CLUSTR.2005.347091","url":null,"abstract":"Summary form only given. In this poster, we present our experience in implementing a high performance computing cluster for teaching parallel computing theory and development of parallel applications. In teaching parallel and high performance computing, there is often a gap between potential performance taught in the lectures and those practically experienced in exercises in the laboratory. The development of the ELIHE cluster provides us with an opportunity take a hands on approach in teaching programming environments, tools, and libraries for development of parallel applications, parallel computation, architectures, message passing and shared memory paradigms using MPI and OpenMP, etc, at both undergraduate and graduate level. The ELIHE HP cluster consists of 9 computational nodes and a master node. All the nodes in the cluster are commodity systems - PCs, running commodity software - Linux, and CLIC Mandrake. Creating the ELIHE cluster has fulfilled two important goals: to design and implement a HP cluster for teaching parallel computing architectures in the School of Science and Technology, and to promote the use of high performance computer technology for research to faculty members and students","PeriodicalId":255312,"journal":{"name":"2005 IEEE International Conference on Cluster Computing","volume":"114 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117212508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Grid and Cluster Matrix Computation with Persistent Storage and Out-of-core Programming","authors":"Lamine M. Aouad, S. Petiton, M. Sato","doi":"10.1109/CLUSTR.2005.347071","DOIUrl":"https://doi.org/10.1109/CLUSTR.2005.347071","url":null,"abstract":"In this paper we present a performance evaluation of a large-scale numerical application on a cluster and a global grid/cluster platform. The computational resources are a cluster of clusters (34 nodes, 84 processors) and a local area network grid (128 nodes), distributed on two geographic sites: Tsukuba University (Japan) and University of Lille I (France). We compare a classical MPI (message passing interface) version with global grid/cluster versions. We also present and test some techniques for numerical applications on a grid/cluster infrastructure based on out-of-core programming and an efficient data placement. We discuss the performances of a block-based Gauss-Jordan method for large matrix inversion. As experimental grid middleware we use the XtremWeb system to manage non-dedicated distributed resources","PeriodicalId":255312,"journal":{"name":"2005 IEEE International Conference on Cluster Computing","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117003841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Registration and Resource Allocation Mechanisms in High-Performance Application Frameworks","authors":"O. Volberg, J. Larson, R. Jacob, J. Michalakes","doi":"10.1109/CLUSTR.2005.347084","DOIUrl":"https://doi.org/10.1109/CLUSTR.2005.347084","url":null,"abstract":"Summary form only given. Commodity clusters have enabled ambitious multiphysics or coupled modeling of complex, mutually interacting, computationally intensive systems in science and engineering. Each individual sub-system is represented as a component with its own parallel processor layout and requirements for temporal advance. A central challenge in developing such systems is the parallel coupling problem, which involves overall system architecture and the automation of component registration, distribution of the processor pool between individual components, parallel data transfer and transformation. There currently exist efficient mechanisms for automating parallel data transfer and transformation such as MCT and MPCCI. Mechanisms for top-level system integration, including component registration and resource allocation, scheduling, and control at runtime are less mature and face even greater challenges in heterogeneous environments. We will discuss the numerous architectural choices faced in framework and parallel coupled application development, and will illustrate them through a comparison of these mechanisms in four scientific application frameworks: the community climate system model, the space weather modeling framework, the earth system modeling framework, and the weather research and forecasting model. We will then discuss a more sophisticated set of requirements for automating these functions in application frameworks for heterogeneous clusters and computational grids","PeriodicalId":255312,"journal":{"name":"2005 IEEE International Conference on Cluster Computing","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132621247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
B. Rutt, Vijay S. Kumar, T. Pan, T. Kurç, Ümit V. Çatalyürek, J. Saltz, Yujun Wang
{"title":"Distributed Out-of-Core Preprocessing of Very Large Microscopy Images for Efficient Querying","authors":"B. Rutt, Vijay S. Kumar, T. Pan, T. Kurç, Ümit V. Çatalyürek, J. Saltz, Yujun Wang","doi":"10.1109/CLUSTR.2005.347054","DOIUrl":"https://doi.org/10.1109/CLUSTR.2005.347054","url":null,"abstract":"We present a combined task- and data-parallel approach for distributed execution of pre-processing operations to support efficient evaluation of polygonal aggregation queries on digitized microscopy images. Our approach targets out-of-core, pipelined processing of very large images on active storage clusters. Our experimental results show that the proposed approach is scalable both in terms of number of processors and the size of images","PeriodicalId":255312,"journal":{"name":"2005 IEEE International Conference on Cluster Computing","volume":"87 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116304373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}