{"title":"Near Overhead-free Heterogeneous Thread-migration","authors":"R. Veldema, M. Philippsen","doi":"10.1109/CLUSTR.2005.347042","DOIUrl":"https://doi.org/10.1109/CLUSTR.2005.347042","url":null,"abstract":"Thread migration moves a single call-stack to another machine to improve either load balancing or locality. Current approaches for checkpointing and thread migration are either not heterogeneous or they introduce large runtime overhead. In general, previous approaches add overhead by instrumenting each function in a program. The instrumentation costs are then even incurred when no thread migration is performed. In this respect our system is near-overhead free: nearly no overhead is caused if no migration is performed. Our implementation instead generates meta-functions for each location in the code where a function is called. These functions portably save and rebuild activation records to and from a machine-independent format. Each variable of an activation record is described in terms of its usages in a machine-independent `usage descriptor string' to enable heterogeneous, near overhead free thread migration with as few as possible changes to a compiler. Our resulting thread migration solution is, for example, able to move a thread between an x86 machine (few registers, 32 bits) and an Itanium machine (many registers, 64 bits). Furthermore, we (optionally) move the decision on when and where to migrate to the application programmer instead of implementing a fixed 'fits-all' heuristics as in previous approaches","PeriodicalId":255312,"journal":{"name":"2005 IEEE International Conference on Cluster Computing","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116080532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Generic Proxy Mechanism for Secure Middlebox Traversal","authors":"Se-Chang Son, M. Farrellee, M. Livny","doi":"10.1109/CLUSTR.2005.347055","DOIUrl":"https://doi.org/10.1109/CLUSTR.2005.347055","url":null,"abstract":"Firewalls/NATs have brought significant connectivity problems along with their benefits, causing many applications to break or become inefficient. Due to its bi-directional communication, huge scale, and multi-organizational nature, the grid may be one of the areas damaged most by the connectivity problem. Several ideas to deal with the connectivity problem were investigated and many systems are available. However, many issues still remain unanswered. Most systems are firewall/NAT unfriendly and are considered harmful to network security; the tussle between these devices trying to investigate pay loads and applications trying to protect their content from observation and modification must be reconciled. This paper discusses how a simple relay-based system, called XRAY (middlebox traversal by relaying), deals with these issues and provides other benefits such as flexible traffic control. This paper also discusses how relay-based traversal systems can help applications to communicate over firewalls/NATs and also complement firewall/NAT operations to help network security","PeriodicalId":255312,"journal":{"name":"2005 IEEE International Conference on Cluster Computing","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124190427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Toward global and grid computing for large scale linear algebra problems","authors":"L. Choy, S. Petiton","doi":"10.1109/CLUSTR.2005.347026","DOIUrl":"https://doi.org/10.1109/CLUSTR.2005.347026","url":null,"abstract":"Platforms of global computing and grid computing have been developed in parallel without any link between them. However, they can be considered as complementary tools and it would be interesting to gather their resources. Moreover, we focus on linear algebra which has been seldom fitted to global computing. In particular, we aim the eigen-problem which is often used by the applications of industrial companies and laboratories. In this paper, we unify those two goals. We fit the algorithm of bisection on a platform of global computing, XtremWeb, and on a platform of RPC programming, OmniRPC. Those software are deployed on two different geographic sites at the engineer school of Poly tech 'Lille, France, and at the HPCS laboratory of Tsukuba, Japan. The combination of two different software and two geographic sites allows to do a wide range of tests and, then, to analyze them and compare all configurations","PeriodicalId":255312,"journal":{"name":"2005 IEEE International Conference on Cluster Computing","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131333100","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Framework for Statistical Analysis of Datasets on Heterogeneous Clusters","authors":"R. Cariño, I. Banicescu","doi":"10.1109/CLUSTR.2005.347019","DOIUrl":"https://doi.org/10.1109/CLUSTR.2005.347019","url":null,"abstract":"This paper proposes a framework for the statistical analysis of multiple related datasets on heterogeneous clusters. The set of processors assigned to the framework are partitioned into groups according to rack locations, with the group sizes being chosen to match the degree of concurrency in the analysis procedure. The datasets are initially divided among the groups. Dynamic loop scheduling is employed to address load imbalance arising from the differences in computational powers of groups, the variability of dataset sizes, and the unpredictable irregularities in the cluster environment. Results of preliminary tests indicate the effectiveness of the framework in fitting gamma-ray burst datasets with vector functional coefficient autoregressive time series models on 64 processors of a heterogeneous general-purpose Linux cluster","PeriodicalId":255312,"journal":{"name":"2005 IEEE International Conference on Cluster Computing","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123106595","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Biologically-inspired Adaptation Mechanism for Autonomic Grid Networks","authors":"Chonho Lee, P. Champrasert, J. Suzuki","doi":"10.1109/CLUSTR.2005.347083","DOIUrl":"https://doi.org/10.1109/CLUSTR.2005.347083","url":null,"abstract":"Summary form only given. This poster presentation describes and empirically evaluates a biologically-inspired adaptation mechanism that allows grid network services to autonomously adapt to dynamic environment changes in the network (e.g. changes in network traffic and resource availability). Based on the observation that the natural immune system has elegantly achieved autonomous adaptation, the proposed mechanism, called the iNet artificial immune system, is designed after the mechanisms behind how the natural immune system detects antigens (e.g. viruses) and specifically reacts to them. iNet models a behavior of grid network services (e.g. migration and replication) as an antibody, and an environment condition (e.g. network traffic and resource availability) as an antigen. iNet allows each grid network service to (1) autonomously sense its surrounding environment conditions (i.e. antigens) to evaluate whether it adapts well to the current conditions, and if it does not, (2) adaptively perform a behavior (i.e. antibody) suitable for the conditions (i.e. antigens). This poster presents the iNet architecture and its algorithm design. It also shows several empirical experimental results. They show that iNet works efficiently at small memory footprint and it makes grid network services adaptive by dynamically changing their population and location against environmental changes in the network","PeriodicalId":255312,"journal":{"name":"2005 IEEE International Conference on Cluster Computing","volume":"173 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123090517","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
N. Travinin, H. Hoffmann, R. Bond, H. Chan, J. Kepner, E. Wong
{"title":"Automatic Parallelization with pMapper","authors":"N. Travinin, H. Hoffmann, R. Bond, H. Chan, J. Kepner, E. Wong","doi":"10.1109/CLUSTR.2005.347017","DOIUrl":"https://doi.org/10.1109/CLUSTR.2005.347017","url":null,"abstract":"Algorithm implementation efficiency is key to delivering high-performance computing capabilities to demanding, high throughput signal and image processing applications and simulations. Significant progress has been made in optimization of serial programs, but many applications require parallel processing, which brings with it the difficult task of determining efficient mappings of algorithms. The pMapper infrastructure addresses the problem of performance optimization of multistage MATLABreg applications on parallel architectures. pMapper is an automatic performance tuning library written as a layer on top of pMatlab: Parallel Matlab Toolbox. While pMatlab abstracts the message-passing interface, the responsibility of mapping numerical arrays falls on the user. Choosing the best mapping for a set of numerical arrays is a nontrivial task that requires significant knowledge of programming languages, parallel computing, and processor architecture. pMapper automates the task of map generation. This abstract addresses the design details of pMapper and presents preliminary results","PeriodicalId":255312,"journal":{"name":"2005 IEEE International Conference on Cluster Computing","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134396028","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Transparent Checkpoint-Restart of Distributed Applications on Commodity Clusters","authors":"Oren Laadan, Dan B. Phung, Jason Nieh","doi":"10.1109/CLUSTR.2005.347039","DOIUrl":"https://doi.org/10.1109/CLUSTR.2005.347039","url":null,"abstract":"We have created ZapC, a novel system for transparent coordinated checkpoint-restart of distributed network applications on commodity clusters. ZapC provides a thin visualization layer on top of the operating system that decouples a distributed application from dependencies on the cluster nodes on which it is executing. This decoupling enables ZapC to checkpoint an entire distributed application across all nodes in a coordinated manner such that it can he restarted from the checkpoint on a different set of cluster nodes at a later time. ZapC checkpoint-restart operations execute in parallel across different cluster nodes, providing faster checkpoint-restart performance. ZapC uniquely supports network state in a transport protocol independent manner, including correctly saving and restoring socket and protocol state for both TCP and UDP connections. We have implemented a ZapC Linux prototype and demonstrate that it provides low visualization overhead and fast checkpoint-restart times for distributed network applications without any application, library, kernel, or network protocol modifications","PeriodicalId":255312,"journal":{"name":"2005 IEEE International Conference on Cluster Computing","volume":"82 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131327717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Macro-Dataflow using Software Distributed Shared Memory","authors":"Hiroshi Tanabe, H. Honda, T. Yuba","doi":"10.1109/CLUSTR.2005.347078","DOIUrl":"https://doi.org/10.1109/CLUSTR.2005.347078","url":null,"abstract":"Macro-dataflow processing, which exploits the parallelism among coarse-grain tasks (macrotasks) such as loops and subroutines, is considered promising to break the performance limits of loop parallelism. To realize macro-dataflow processing on distributed memory systems, \"data reaching conditions\", a method to make the sender-receiver pair of a data transfer determined at runtime, has previously been proposed. However, irregular data accesses induce extra data transfers, which lead to performance deterioration. This paper proposes an implementation method using software distributed shared memory, which enables on-demand data fetching. This paper describes the implementation using two well-accepted, page-based software distributed shared memory systems, TreadMarks and JI-AJIA. Evaluation results on a PC cluster show the software distributed memory approach is as much as 25% faster than the data reaching conditions","PeriodicalId":255312,"journal":{"name":"2005 IEEE International Conference on Cluster Computing","volume":"43 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131687162","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Restricted Slow-Start for TCP","authors":"W. Allcock, Sanjay Hegde, Rajkumar Kettimuthul","doi":"10.1109/CLUSTR.2005.347079","DOIUrl":"https://doi.org/10.1109/CLUSTR.2005.347079","url":null,"abstract":"In network protocol research a common goal is optimal bandwidth utilization, while still being network friendly. The drawback of TCP in networks with large bandwidth-delay products due to its AIMD based congestion control mechanism is well known. The congestion control algorithm of TCP has two phases namely slow-start phase and congestion-avoidance phase. Many researchers have focused on modifying the congestion avoidance phase of the algorithm. In this work, we propose a modification to the slow-start phase of the algorithm to achieve better performance. Restricted slow-start algorithm is a simple sender side alteration to the TCP congestion window update algorithm","PeriodicalId":255312,"journal":{"name":"2005 IEEE International Conference on Cluster Computing","volume":"25 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131805515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hierarchical CPU Schedulers for Multiprocessor Systems, Fair CPU Scheduling and Processes Isolation","authors":"K. Korotaev","doi":"10.1109/CLUSTR.2005.347085","DOIUrl":"https://doi.org/10.1109/CLUSTR.2005.347085","url":null,"abstract":"Summary form only given. Modern multi-user systems are able to run concurrently hundreds of applications, but provide little isolation between users. Absence of isolation can result in a poor performance for some users or accidental or intentional denial-of-service. In modern computational clusters these issues are usually prevented with the help of jobs and job schedulers, which postpone a job and schedule it later, when resources are available. The author proposes to extend CPU/job schedulers to meet interactive multi-user system requirements where application latencies should be low enough (< 1 sec) and at the same time provide isolation between users or group of processes and quality-of-service control. As an example, such scheduler can be used for running virtual private servers on top of cluster or NUMA system. Most models developed so far (e.g. (Chandra et al., 2000)) propose hierarchical CPU schedulers based on SFQ-like algorithms (for fairness between process groups) and standard OS scheduler (for processes inside the groups). Such multilevel schedulers imply independence of internal schedulers, i.e. independence of their scheduling decisions. This results in a number of problems and inefficiencies (Korotaev and Savochkin, 2005) when fair scheduler and Linux O(1)-scheduler are stacked together (e.g. fair scheduler should take into account load on per-CPU runqueues, etc.). The author extended the model of hierarchical CPU schedulers with a new scheduling level called virtual CPU scheduler (Korotaev and Savochkin, 2005). This new middle level allows top-level fair scheduler to be independent in its decisions from standard CPU scheduler and introduces the notion of virtual and physical CPUs. Top level fair scheduler makes its decisions about scheduling virtual CPUs on physical CPUs, then middle-level scheduler selects a virtual CPU and standard Linux scheduler selects a process from virtual CPU runqueue. Additionally the author proposes some new CPU scheduler QoS parameters such as a parameter for limiting CPU a user can get even if there are spare CPU resources. Such QoS parameters provide better control over CPU apportioning and allow limiting user to some CPU shares. As an example, this can be used for limiting user's CPU consumption to the paid share of all CPUs power","PeriodicalId":255312,"journal":{"name":"2005 IEEE International Conference on Cluster Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131223406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}