{"title":"A Cost/Benefit Estimating Service for Mapping Parallel Applications on Heterogeneous Clusters","authors":"D. Katramatos, S. Chapin","doi":"10.1109/CLUSTR.2005.347062","DOIUrl":"https://doi.org/10.1109/CLUSTR.2005.347062","url":null,"abstract":"Matching the resource requirements of a parallel application to the available resources of a large, heterogeneous cluster is a key requirement in effectively scheduling the application tasks on the nodes of the cluster. This paper describes the cost/benefit estimating service (CBES), a runtime scheduling system targeted at finding highly effective schedules (or mappings) of tasks on nodes. CBES relies on its own infrastructure to gather and maintain static and dynamic information profiles for the computing system and the applications of interest. At the core of CBES is a mapping evaluation module which evaluates candidate application mappings on the basis of shortest execution times. By default, CBES uses a simulated-annealing based scheduler to select mappings. The paper presents the design, initial implementation, and test results of CBES on the Centurion cluster at the University of Virginia and the Orange Grove cluster at Syracuse University. These tests demonstrated that the exploitation of internode communication speed differences due to network heterogeneity can yield speedups of over 10% between same architecture nodes. The maximum observed speedup across architectures for the best vs. worst mapping scenarios of the same application was over 36%, while the corresponding average case speedup was approximately 30%","PeriodicalId":255312,"journal":{"name":"2005 IEEE International Conference on Cluster Computing","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117187701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Experiment Management and Analysis with perfbase","authors":"Joachim Worringen","doi":"10.1109/CLUSTR.2005.347052","DOIUrl":"https://doi.org/10.1109/CLUSTR.2005.347052","url":null,"abstract":"Achieving the desired performance with application software, middleware or operating system components on a parallel computer like a cluster is a complex task. Typically, a high-dimensional parameter space has to be reduced to a small number of core parameters, which influence the performance most significantly, but still a large number of experiments is necessary to determine how the best performance can be achieved. Keeping track of these experiments to derive the correct conclusions is a major task. This paper presents perfbase, a set of front end tools and an SQL database as backend, which together form a system for the management and analysis of the output of experiments. In this context, an experiment is an execution of an application or library on a computer system. The output of such an experiment are one or more text files containing information on the execution of the application. This output is the input for perfbase which extracts specified information to store it in the database and make it available for management and analysis purposes in a consistent, fast and flexible manner","PeriodicalId":255312,"journal":{"name":"2005 IEEE International Conference on Cluster Computing","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117234362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
N.A. Fonseca, Fernando M A Silva, V. S. Costa, Rui Camacho
{"title":"A pipelined data-parallel algorithm for ILP","authors":"N.A. Fonseca, Fernando M A Silva, V. S. Costa, Rui Camacho","doi":"10.1109/CLUSTR.2005.347059","DOIUrl":"https://doi.org/10.1109/CLUSTR.2005.347059","url":null,"abstract":"The amount of data collected and stored in databases is growing considerably for almost all areas of human activity. Processing this amount of data is very expensive, both humanly and computationally. This justifies the increased interest both on the automatic discovery of useful knowledge from databases, and on using parallel processing for this task. Multi relational data mining (MRDM) techniques, such as inductive logic programming (ILP), can learn rides from relational databases consisting of multiple tables. However, ILP systems are designed to run in main memory and can have long running times. We propose a pipelined data-parallel algorithm for ILP. The algorithm was implemented and evaluated on a commodity PC cluster with 8 processors. The results show that our algorithm yields excellent speedups, while preserving the quality of learning","PeriodicalId":255312,"journal":{"name":"2005 IEEE International Conference on Cluster Computing","volume":"96 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115534876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Performance Effects of Interrupt Throttle Rate on Linux Clusters using Intel Gigabit Network Adapters","authors":"Baris Guler, R. Radhakrishnan, Ronald Pepper","doi":"10.1109/CLUSTR.2005.347089","DOIUrl":"https://doi.org/10.1109/CLUSTR.2005.347089","url":null,"abstract":"Summary form only given. Many high performance computing clusters (HPCC) are still built using gigabit Ethernet as the interconnect connecting all the computing nodes even though there are faster (lower latency and higher bandwidth) alternatives such as Infiniband and Myrinet. The choice of interconnect mainly depends on the parallel application communication characteristics as well as budget requirements since the faster alternatives are much more expensive compared to gigabit Ethernet especially at lower node counts. Some applications require lower latency interconnect since they communicate more frequently but send relatively small messages, and others can be sending infrequent but large messages thus requiring a higher bandwidth interconnect. Since PCs, workstations and servers are designed for server-client type of environment, network interface card (NIC) drivers are usually optimized for specific network traffic patterns by using several interrupt moderation techniques/parameters, specifically interrupt throttle rate (ITR). Since in an HPCC environment the parallel application communication characteristics (i.e. network traffic patterns) are usually different than the default setting, an ITR value has to be identified to achieve best overall system performance for each type of application. This poster will present the case for why this is an important area in high-performance computing clusters connected using gigabit interconnects. It will present methodologies to tune the interrupt throttle rate parameter given to the driver to achieve a balance between application and network performance. Performance results on typical applications will be shown on different clusters","PeriodicalId":255312,"journal":{"name":"2005 IEEE International Conference on Cluster Computing","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114698949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A New Allocation Scheme for Parallel Applications with Deadline and Security Constraints on Clusters","authors":"T. Xie, X. Qin","doi":"10.1109/CLUSTR.2005.347057","DOIUrl":"https://doi.org/10.1109/CLUSTR.2005.347057","url":null,"abstract":"Parallel applications with deadline and security constraints are emerging in various areas like education, information technology, and business. However, conventional job schedulers for clusters generally do not take security requirements of realtime parallel applications into account when making allocation decisions. In this paper, we address the issue of allocating tasks of parallel applications on clusters subject to timing and security constraints in addition to precedence relationships. A task allocation scheme, or TAPADS (task allocation for parallel applications with deadline and security constraints), is developed to find an optimal allocation that maximizes quality of security and the probability of meeting deadlines for parallel applications. In addition, we proposed mathematical models to describe a system framework, parallel applications with deadline and security constraints, and security overheads. Experimental results show that TAPADS significantly improves the performance of clusters in terms of quality of security and schedulability over three existing allocation schemes","PeriodicalId":255312,"journal":{"name":"2005 IEEE International Conference on Cluster Computing","volume":"150 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133166968","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Memory Management Support for Multi-Programmed Remote Direct Memory Access (RDMA) Systems","authors":"K. Magoutis","doi":"10.1109/CLUSTR.2005.347031","DOIUrl":"https://doi.org/10.1109/CLUSTR.2005.347031","url":null,"abstract":"Current operating systems offer basic support for network interface controllers (NICs) supporting remote direct memory access (RDMA). Such support typically consists of a device driver responsible for configuring communication channels between the device and user-level processes but not involved in data transfer. Unlike standard NICs, RDMA-capable devices incorporate significant memory resources for address translation purposes. In a multi-programmed operating system (OS) environment, these memory resources must be efficiently shareable by multiple processes. For such sharing to occur in a fair manner, the OS and the device must cooperate to arbitrate access to NIC memory, similar to the way CPUs and OSes cooperate to arbitrate access to translation lookaside buffers (TLBs) or physical memory. A problem with this approach is that today's RDMA NICs are not integrated into the functions provided by OS memory management systems. As a result, RDMA NIC hardware resources are often monopolized by a single application. In this paper, I propose two practical mechanisms to address this problem: (a) Use of RDMA only in kernel-resident I/O subsystems, transparent to user-level software; (b) An extended registration API and a kernel upcall mechanism delivering NIC TLB entry replacement notifications to user-level libraries. Both options are designed to re-instate the multiprogramming principles that are violated in early commercial RDMA systems","PeriodicalId":255312,"journal":{"name":"2005 IEEE International Conference on Cluster Computing","volume":"192 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133948035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Cuenca, Luis-Pedro García, D. Giménez, J. Dongarra
{"title":"Processes Distribution of Homogeneous Parallel Linear Algebra Routines on Heterogeneous Clusters","authors":"J. Cuenca, Luis-Pedro García, D. Giménez, J. Dongarra","doi":"10.1109/CLUSTR.2005.347021","DOIUrl":"https://doi.org/10.1109/CLUSTR.2005.347021","url":null,"abstract":"This paper presents a self-optimization methodology for parallel linear algebra routines on heterogeneous systems. For each routine, a series of decisions is taken automatically in order to obtain an execution time close to the optimum (without rewriting the routine's code). Some of these decisions are: the number of processes to generate, the heterogeneous distribution of these processes over the network of processors, the logical topology of the generated processes,... To reduce the search space of such decisions, different heuristics have been used. The experiments have been performed with a parallel LU factorization routine similar to the ScaLAPACK one, and good results have been obtained on different heterogeneous platforms","PeriodicalId":255312,"journal":{"name":"2005 IEEE International Conference on Cluster Computing","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121416165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Toward an Optimal Redundancy Strategy for Distributed Computations","authors":"Doug Szajda, Barry Lawson, Jason Owen","doi":"10.1109/CLUSTR.2005.347045","DOIUrl":"https://doi.org/10.1109/CLUSTR.2005.347045","url":null,"abstract":"Volunteer distributed computations utilize spare processor cycles of personal computers that are connected to the Internet. The related computation integrity concerns are commonly addressed by assigning tasks redundantly. Aside from the additional computational costs, a significant disadvantage of redundancy is its vulnerability to colluding adversaries. This paper presents a tunable redundancy-based task distribution strategy that increases resistance to collusion while significantly decreasing the associated computational costs. Specifically, our strategy guarantees a desired cheating detection probability regardless of the number of copies of a specific task controlled by the adversary. Though not the first distribution scheme with these properties, the proposed method improves upon existing strategies in that it requires fewer computational resources. More importantly, the strategy provides a practical lower bound for the number of redundantly assigned tasks required to achieve a given detection probability","PeriodicalId":255312,"journal":{"name":"2005 IEEE International Conference on Cluster Computing","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123736557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Topology-Based Meshes for Local Communication in Data-Parallel Applications","authors":"S. Figueira, S. Beeby, A. S. Wu","doi":"10.1109/CLUSTR.2005.347025","DOIUrl":"https://doi.org/10.1109/CLUSTR.2005.347025","url":null,"abstract":"Data-parallel applications often use a mesh as the communication structure for their local-communication operations. To obtain the best possible performance, when executing on heterogeneous networks, it is important to match this mesh with the topology of the underlying network. This paper presents and compares strategies for matching meshes with heterogeneous networks. Our experiments have shown that some of the strategies presented do decrease the time spent in local communication and improve the performance of data-parallel applications","PeriodicalId":255312,"journal":{"name":"2005 IEEE International Conference on Cluster Computing","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121080043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Wolf, H. Abbasi, Benjamin Collins, David Spain, K. Schwan
{"title":"Service Augmentation for High End Interactive Data Services","authors":"M. Wolf, H. Abbasi, Benjamin Collins, David Spain, K. Schwan","doi":"10.1109/CLUSTR.2005.347053","DOIUrl":"https://doi.org/10.1109/CLUSTR.2005.347053","url":null,"abstract":"Advances in computational science, combined with the increasingly interdisciplinary and geographically distributed research teams, have led to a need to support multi-tiered, data- and meta-data-rich collaboration infrastructures. Our research addresses the interactive, remote tasks undertaken in such collaborations, which require a flexible software infrastructure able to dynamically deploy services where and when needed, and to provide data to clients in the forms in which they require it with suitable levels of end-to-end performance. The concept of service augmentation advanced in this paper seeks to continuously adjust the differences or degrees of incompatibility between the data received and the data displayed or stored by clients. Difference adjustments occur anywhere on the paths between data providers and clients, and compatibility computations leverage all of the resources that may be brought to bear, including CPUs and GPUs on servers and additional data manipulations on server, overlay, and client nodes. A formal structure and experimental evaluations of this concept are performed with the SmartPointer scientific visualization and annotation framework, for which we show that data-driven SLAs provide improved client flexibility and the ability to maintain application-specific notions of quality of service","PeriodicalId":255312,"journal":{"name":"2005 IEEE International Conference on Cluster Computing","volume":"2015 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127754429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}