{"title":"Adaptive Checkpointing for Master-Worker Style Parallelism","authors":"G. Cooperman, Jason Ansel, Xiaoqin Ma","doi":"10.1109/CLUSTR.2005.347096","DOIUrl":"https://doi.org/10.1109/CLUSTR.2005.347096","url":null,"abstract":"We present a transparent, system-level checkpointing solution for master-worker parallelism that automatically adapts, upon restore, to the number of processor nodes available. We call this adaptive checkpointing. This is important, since nodes in a cluster fail. It also allows one to adapt to using mutliple cluster partitions, as they become available. Checkpointing a master-worker computation has the additional advantage of needing to checkpoint only the master process. This is both fast (0.05 s in our case), and more economical of disk space. We describe a system-level solution. The application writer does not declare what data structures to checkpoint. Furthermore, the solution is transparent. The application writer need not add code to request a checkpoint at appropriate locations. The system-level strategy avoids the labor-intensive and error-prone work of explicitly checkpointing the many data structures of a large program","PeriodicalId":255312,"journal":{"name":"2005 IEEE International Conference on Cluster Computing","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126733161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Quantum Trajectories with Dynamic Loop Scheduling and Reinforcement Learning","authors":"R. Cariño, I. Banicescu, J. P. Pabico, M. Rashid","doi":"10.1109/CLUSTR.2005.347015","DOIUrl":"https://doi.org/10.1109/CLUSTR.2005.347015","url":null,"abstract":"The study of many problems in quantum mechanics is based on finding the solution to the time-dependent Schrodinger equation which describes the dynamics of quantum-mechanical systems composed of a particle of mass m moving in a potential V. Based on the hydrodynamic interpretation of quantum mechanics by Bohm (1952), an unstructured grid approach, the quantum trajectory method (QTM) was developed by Lopreore and Wyatt (1999). Derivatives needed for updating the equations of motion are obtained using curve-fitting by a moving weighted least squares algorithm, and analytically differentiating the least squares curves. The calculations involve computationally-intensive parallel loops with nonuniform iterate execution times","PeriodicalId":255312,"journal":{"name":"2005 IEEE International Conference on Cluster Computing","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115316861","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Load Balancing using Grid-based Peer-to-Peer Parallel I/O","authors":"Yijian Wang, D. Kaeli","doi":"10.1109/CLUSTR.2005.347040","DOIUrl":"https://doi.org/10.1109/CLUSTR.2005.347040","url":null,"abstract":"In the area of grid computing, there is a growing need to process large amounts of data. To support this trend, we need to develop efficient parallel storage systems that can provide for high performance for data-intensive applications. In order to overcome I/O bottlenecks and to increase I/O parallelism, data streams need to be parallelized at both the application level and the storage device level. In this paper, we propose a novel peer-to-peer (P2P) storage architecture for MPI applications on grid systems. We first present an analytic model of our P2P storage architecture. Next, we describe a profile-guided data allocation algorithm that can increase the degree of I/O parallelism present in the system, as well as to balance I/O in a heterogeneous system. We present results on an actual implementation. Our experimental results show that by partitioning data across all available storage devices and carefully tuning I/O workloads in the grid system, our peer-to-peer scheme can deliver scalable high performance I/O that can address I/O-intensive workloads","PeriodicalId":255312,"journal":{"name":"2005 IEEE International Conference on Cluster Computing","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124747525","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A TCO Analysis of Cluster Servers Based on TPC-H Benchmarks","authors":"E. Capuano","doi":"10.1109/CLUSTR.2005.347080","DOIUrl":"https://doi.org/10.1109/CLUSTR.2005.347080","url":null,"abstract":"Summary form only given. In this article the author presents a TPC-H benchmark-based cost analysis of cluster servers as computing configuration alternative to consolidated servers available in the market. The analytical model is expressed with mathematical equations to elicitat the issue and demonstrate the results, starting from Gartner's total cost of ownership (TCO) concept and factoring its costs items in terms of hardware and software costs in a server computing approach. The TPC-H benchmark data employed to test the model show that a cluster server configuration may be not a good idea in some computing scenarios, mainly in medium to large database systems, but could be best fit, economically speaking, in scenarios where are employed commodity hardware and license-free software to implement the cluster. And, reinforcing former research of other authors, the model shows that software, much more than hardware, is the key set of components in an economic analysis of a server cluster","PeriodicalId":255312,"journal":{"name":"2005 IEEE International Conference on Cluster Computing","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125361040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
T. Repantis, C. Antonopoulos, V. Kalogeraki, T. Papatheodorou
{"title":"A Case for Dynamic Page Migration in Multiple-Writer Software DSM Systems","authors":"T. Repantis, C. Antonopoulos, V. Kalogeraki, T. Papatheodorou","doi":"10.1109/CLUSTR.2005.347077","DOIUrl":"https://doi.org/10.1109/CLUSTR.2005.347077","url":null,"abstract":"Software DSMs (SDSMs) are an appealing alternative to message passing, since they facilitate the programmability of clusters. However the ease of programming comes at the expense of performance. Although accesses of data that reside to the memory of remote nodes are transparent to the programmer, they suffer from significantly higher latencies compared to local accesses. As a consequence, it is desirable to move data as close as possible to the nodes that need them most. In this paper we introduce a protocol for dynamically migrating memory pages in home-based SDSM systems. In these systems each page has a designated home node; yet our protocol allows a node that heavily modifies a page to become its new home. The new protocol targets multiple-writer DSMs, i.e. DSMs that allow multiple nodes to concurrently modify the same page. The process is dynamic and transparent to the applications programmer. Moreover, it does not assume a specific consistency protocol. Experimental results show that our page migration protocol reduces remote page modifications, decreases the average memory access latency, as well as the overhead for the preservation of memory consistency. The benefit for the end-user is a significant improvement in application performance","PeriodicalId":255312,"journal":{"name":"2005 IEEE International Conference on Cluster Computing","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128544184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SymbioticSphere: Towards an Autonomic Grid Network System","authors":"P. Champrasert, Chonho Lee, J. Suzuki","doi":"10.1109/CLUSTR.2005.347086","DOIUrl":"https://doi.org/10.1109/CLUSTR.2005.347086","url":null,"abstract":"This paper describes SymbioticSphere, a novel biologically-inspired architecture that allows grid systems (application services and middleware platforms) to be scalable and adaptive to dynamic network environments. In SymbioticSphere, each service and platform is designed as a biological entity, analogous to an individual bee in a bee colony. Services and platforms implement biological concepts and mechanisms such as decentralization, emergence, energy exchange, migration, replication and death. Like in biological systems, desirable systems characteristics such as scalability and adaptability emerge from collective actions and interactions of services and platforms","PeriodicalId":255312,"journal":{"name":"2005 IEEE International Conference on Cluster Computing","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134020630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the Discovery of Brokers in Distributed Messaging Infrastructures","authors":"S. Pallickara, H. Gadgil, G. Fox","doi":"10.1109/CLUSTR.2005.347076","DOIUrl":"https://doi.org/10.1109/CLUSTR.2005.347076","url":null,"abstract":"Increasingly messaging infrastructures are being used to support the communication requirements of a wide variety of clients, services, and proxies thereto. Typically, for various reason this messaging infrastructure is a distributed one with multiple constituent brokers. In the paper we present our scheme for the discovery of brokers in distributed messaging infrastructures based on the publish/subscribe paradigm. We also include empirical results from our experiments related to the implementation of our scheme","PeriodicalId":255312,"journal":{"name":"2005 IEEE International Conference on Cluster Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131206722","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Load Balancing Scheme for Cluster-based Secure Network Servers","authors":"Jin-Ha Kim, G. S. Choi, C. Das","doi":"10.1109/CLUSTR.2005.347056","DOIUrl":"https://doi.org/10.1109/CLUSTR.2005.347056","url":null,"abstract":"Although the secure sockets layer (SSL) is the most popular protocol to provide a secure channel between a client and a cluster-based network server, its high overhead degrades the server performance considerably, and thus, affects the server scalability. Therefore, improving the performance of SSL-enabled network servers is critical for designing scalable and high performance data centers. In this paper, we examine the impact of SSL offering and SSL-session aware distribution in cluster-based network servers. We propose a backend forwarding scheme, called ssl_with_bf that employs a low-overhead user-level communication mechanism like VIA to achieve good load balance among server nodes. We compare three distribution models for network servers: round robin (RR), ssl_with_session and ssl_with_bf through simulation. The experimental results with 16-node and 32-node cluster configurations show that while session reuse of ss_with_session is critical to improve the performance of application servers, the proposed backend forwarding scheme can further enhance the performance due to better load balancing. The ssl_with_bf scheme can minimize average latency by about 40% and improve throughput across a variety of workloads","PeriodicalId":255312,"journal":{"name":"2005 IEEE International Conference on Cluster Computing","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126338073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jorge G. Barbosa, C. Morais, Rui Nóbrega, A. P. Monteiro
{"title":"Static scheduling of dependent parallel tasks on heterogeneous clusters","authors":"Jorge G. Barbosa, C. Morais, Rui Nóbrega, A. P. Monteiro","doi":"10.1109/CLUSTR.2005.347024","DOIUrl":"https://doi.org/10.1109/CLUSTR.2005.347024","url":null,"abstract":"This paper addresses the problem of scheduling parallel tasks, represented by a direct acyclic graph (DAG) on heterogeneous clusters. Parallel tasks, also called malleable tasks, are tasks that can be executed on any number of processors with its execution time being a function of the number of processors allotted to it. The scheduling of independent parallel tasks on homogeneous machines has been extensively studied and the case of parallel tasks with precedence constraints has been studied for tree-like graphs. For arbitrary precedence graphs and for heterogeneous machines, the optimization problem is more complex because the processing time of a given task depends on the number of processors and on the total processing capacity of those processors. This paper presents a list scheduling algorithm to minimize the total length of the schedule (makespan) of a given set of parallel tasks, whose dependencies are represented by a DAG","PeriodicalId":255312,"journal":{"name":"2005 IEEE International Conference on Cluster Computing","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125443595","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Performance Analysis of the Ammasso RDMA Enabled Ethernet Adapter and its iWARP API","authors":"D. Dalessandro, P. Wyckoff","doi":"10.1109/CLUSTR.2005.347028","DOIUrl":"https://doi.org/10.1109/CLUSTR.2005.347028","url":null,"abstract":"Network speeds are increasing well beyond the capabilities of today's CPUs to efficiently handle the traffic. This bottleneck at the CPU causes the processor to spend more of its time handling communication and less time on actual processing. As network speeds reach 10 Gb/s and more, the CPU simply can not keep up with the data. Various methods have been proposed to solve this problem. High performance interconnects, such as Infiniband, have been developed that rely on RDMA and protocol offload in order to achieve higher throughput and lower latency. In this paper we evaluate the feasibility of a similar approach which, unlike existing high performance interconnects, requires no special infrastructure. RDMA over Ethernet, otherwise known as iWARP, facilitates the zero copy exchange of data over ordinary local area networks. Since it is based on TCP, iWARP enables RDMA in the wide area network as well. This paper provides a look into the performance of one of the earliest commodity implementations of this emerging technology, the Ammasso 1100 RNIC","PeriodicalId":255312,"journal":{"name":"2005 IEEE International Conference on Cluster Computing","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129907207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}