{"title":"A Java dialect free of data races and without annotations","authors":"Luis Mateu","doi":"10.1109/IPDPS.2003.1213143","DOIUrl":"https://doi.org/10.1109/IPDPS.2003.1213143","url":null,"abstract":"fWe introduce a dialect of Java, JShield, for concurrent object oriented programming, whose primary design goal is robustness. JShield preserves the Java syntax and the semantics of sequential Java programs. It modifies the semantics of concurrent programs to completely avoid data races without relying on the programmer ability, and without requiring special annotations. This is achieved by combining Hoare's monitors with remote method invocations of Java to ensure the proper request of a lock before manipulating shared data. We show that this can be done with a reasonable overhead in execution time compared to Java.","PeriodicalId":177848,"journal":{"name":"Proceedings International Parallel and Distributed Processing Symposium","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125738111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An object-oriented framework for efficient data access in data intensive computing","authors":"Tuan A. Nguyen, P. Kuonen","doi":"10.1109/IPDPS.2003.1213305","DOIUrl":"https://doi.org/10.1109/IPDPS.2003.1213305","url":null,"abstract":"Efficient data access is important to achieve high performance in data intensive computing. This paper presents a method of passive data access in the framework of ParoC++-a parallel object-oriented programming environment. ParoC++ extends C++ to distributed environments with the integration of user requirements into parallel objects. Passive data access enables thedata source to initiate and store data directly to a user-specified address space. This ability allows better overlapping between computation and communication by data prediction, partial data processing and auto-data aggregation from multiple sources. Some experiments have been done, showing the scalability and the efficiency of passive data access in ParoC++ compared to direct data access methods.","PeriodicalId":177848,"journal":{"name":"Proceedings International Parallel and Distributed Processing Symposium","volume":"399 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124726491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SPMD image processing on Beowulf clusters: directives and libraries","authors":"Paulo F. Oliveira, J. D. Buf","doi":"10.1109/IPDPS.2003.1213419","DOIUrl":"https://doi.org/10.1109/IPDPS.2003.1213419","url":null,"abstract":"Most image processing algorithms can be parallelized by splitting parallel loops and by using very few communication patterns. Code parallelization using MPI still involves much programming overhead. In order to reduce these overheads, we first developed a small SPMD library (SPMDlib) on top of MPI. The programmer can use the library routines themselves, because they are easy to learn and to apply, even without knowing MPI. However, in order to increase user friendliness, we also develop a small set of parallelization and communication directives/pragmas (SPMDdir), together with a parser that converts these into library calls. SPMDdir is used to develop a new version of SPMDlib. This new version contains much less but generic routines that can be optimized for different network topologies. Extensions for Fortran 90/95 and C are discussed, as well as communication optimizations.","PeriodicalId":177848,"journal":{"name":"Proceedings International Parallel and Distributed Processing Symposium","volume":"170 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124735330","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Cost/performance tradeoffs in network interconnects for clusters of commodity PCs","authors":"C. Kurmann, F. Rauch, T. Stricker","doi":"10.1109/IPDPS.2003.1213360","DOIUrl":"https://doi.org/10.1109/IPDPS.2003.1213360","url":null,"abstract":"The definition of a commodity component is quite obvious when it comes to the PC as a basic compute engine and building block for clusters of PCs. Looking at the options for a more or less performant interconnect between those compute nodes it is much less obvious which interconnect still qualifies as commodity and which not. We are trying to answer this question based on an in-depth analysis of a few common more or less expensive interconnects on the market. Our measurements and observations are based on the experience of architecting, procuring and installing Xibalba, a 128 node - 192 processor versatile cluster for a variety of research applications in the CS department of ETH Zurich. We define our unique way to measure the performance of an interconnect and use our performance characterization to find the best cost performance point for networks in PC clusters. Since our work is tied to the purchase of a machine at fair market value we can also reliably comment on cost performance of the four types of interconnects we considered. We analyze the reason for performance and non-performance for different Fast Ethernet architectures with a set of micro-benchmarks and conclude our study with performance numbers of some applications. Thus, the reader gets an idea about the impact of the interconnect on the overall application performance in commodity PC clusters.","PeriodicalId":177848,"journal":{"name":"Proceedings International Parallel and Distributed Processing Symposium","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125030410","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R. Al-Omari, G. Manimaran, M. Salapaka, Arun Kumar Somani
{"title":"Novel algorithms for open-loop and closed-loop scheduling of real-time tasks in multiprocessor systems based on execution time estimation","authors":"R. Al-Omari, G. Manimaran, M. Salapaka, Arun Kumar Somani","doi":"10.1109/IPDPS.2003.1213081","DOIUrl":"https://doi.org/10.1109/IPDPS.2003.1213081","url":null,"abstract":"Most dynamic real-time scheduling algorithms are open-loop in nature meaning that they do not dynamically adjust their behavior using the performance at run-time. When accurate workload models are not available, such a scheduling can result in a highly underutilized system based on an extremely pessimistic estimation of workload. In recent years, \"closed-loop\" scheduling is gaining importance due to its applicability to many real-world problems wherein the feedback information can be exploited efficiently to adjust system parameters, thereby improving the performance. In this paper, we first propose an open-loop dynamic scheduling algorithm that employs overlap in order to provide flexibility in task execution times. Secondly, we propose a novel closed-loop approach for dynamically estimating the execution time of tasks based on both deadline miss ratio and task rejection ratio. This approach is highly preferable for firm real-time systems since it provides a firm performance guarantee. We evaluate the performance of the open-loop and the closed-loop approaches by simulation and modeling. Our studies show that the closed-loop scheduling offers a significantly better performance (20% gain) over the open-loop scheduling under all the relevant conditions we simulated.","PeriodicalId":177848,"journal":{"name":"Proceedings International Parallel and Distributed Processing Symposium","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129794553","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
W. Goddard, S. Hedetniemi, D. P. Jacobs, P. Srimani
{"title":"Self-stabilizing protocols for maximal matching and maximal independent sets for ad hoc networks","authors":"W. Goddard, S. Hedetniemi, D. P. Jacobs, P. Srimani","doi":"10.1109/IPDPS.2003.1213302","DOIUrl":"https://doi.org/10.1109/IPDPS.2003.1213302","url":null,"abstract":"We propose two distributed algorithms to maintain, respectively, a maximal matching and a maximal independent set in a given ad hoc network; our algorithms are fault tolerant (reliable) in the sense that the algorithms can detect occasional link failures and/or new link creations in the network (due to mobility of the hosts) and can readjust the global predicates. We provide time complexity analysis of the algorithms in terms of the number of rounds needed for the algorithm to stabilize after a topology change, where a round is defined as a period of time in which each node in the system receives beacon messages from all its neighbors. In any ad hoc network, the participating nodes periodically transmit beacon messages for message transmission as well as to maintain the knowledge of the local topology at the node; as a result, the nodes get the information about their neighbor nodes synchronously (at specific time intervals). Thus, the paradigm to analyze the complexity of the self-stabilizing algorithms in the context of ad hoc networks is very different from the traditional concept of an adversary daemon used in proving the convergence and correctness of self-stabilizing distributed algorithms in general.","PeriodicalId":177848,"journal":{"name":"Proceedings International Parallel and Distributed Processing Symposium","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128184912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Dongarra, K. London, S. Moore, P. Mucci, D. Terpstra, Haihang You, Min Zhou
{"title":"Experiences and lessons learned with a portable interface to hardware performance counters","authors":"J. Dongarra, K. London, S. Moore, P. Mucci, D. Terpstra, Haihang You, Min Zhou","doi":"10.1109/IPDPS.2003.1213517","DOIUrl":"https://doi.org/10.1109/IPDPS.2003.1213517","url":null,"abstract":"The PAPI project has defined and implemented a cross-platform interface to the hardware counters available on most modern microprocessors. The interface has gained widespread use and acceptance from hardware vendors, users, and tool developers. This paper reports on experiences with the community-based open-source effort to define the PAPI specification and implement it on a variety of platforms. Collaborations with tool developers who have incorporated support for PAPI are described. Issues related to interpretation and accuracy of hardware counter data and to the overheads of collecting this data are discussed. The paper concludes with implications for the design of the next version of PAPI.","PeriodicalId":177848,"journal":{"name":"Proceedings International Parallel and Distributed Processing Symposium","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128260999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Vikram S. Adve, J. Browne, Brian Ensink, J. Rice, P. Teller, M. Vernon, Stephen J. Wright
{"title":"An approach to optimizing adaptive parabolic PDE solvers for the Grid","authors":"Vikram S. Adve, J. Browne, Brian Ensink, J. Rice, P. Teller, M. Vernon, Stephen J. Wright","doi":"10.1109/IPDPS.2003.1213385","DOIUrl":"https://doi.org/10.1109/IPDPS.2003.1213385","url":null,"abstract":"The method of lines is a widely used algorithm for solving parabolic partial differential equations that could benefit greatly from implementation on Grid computing environments. This paper outlines the issues involved in executing method-of-lines codes on a Grid and in developing model-driven adaptive control strategies for these codes. We have developed a parameterizable benchmark called MOL that captures a wide range of realistic method-of-lines codes. We are using this benchmark to develop performance models that can be used to achieve specific optimality criteria under the available (and dynamically varying) resources of a Grid environment, and under user-specified goals for solution error and computational rate-of-progress. We are developing a componentization strategy that can enable effective adaptive control of MOL, as well as language and compiler support that can simplify the development of adaptive distributed applications. If successful, this work should yield a much better understanding than we have at present of how an important class of parallel numerical applications can be executed effectively in a dynamic Grid environment.","PeriodicalId":177848,"journal":{"name":"Proceedings International Parallel and Distributed Processing Symposium","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129356406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Tornado: a capability-aware peer-to-peer storage network","authors":"Hung-Chang Hsiao, C. King","doi":"10.1109/IPDPS.2003.1213171","DOIUrl":"https://doi.org/10.1109/IPDPS.2003.1213171","url":null,"abstract":"Peer-to-peer storage networks aim at aggregating the unused storage in today's resource-abundant computers to form a large, shared storage space. To lay over the extremely variant machines, networks and administrative organizations, peer-to-peer storage networks must be aware of the capabilities of the constituent components to leverage their resources, performance and reliability. This paper reports our design of such a peer-to-peer storage network, called Tornado. Tornado is built on top of two concepts. The first is the virtual home concept, which adds an extra level of abstraction between data and storage nodes to mask the underlying heterogeneity. The second concept is the classification of the storage nodes into \"good\" and \"bad\" according to their static and dynamic capabilities. Only \"good\" peers can host virtual homes, whereby introducing quality of services into the storage network. We evaluate Tornado via simulation. The results show that Tornado is comparable with previous systems, where each route takes at most [log N] hops, anew node takes [log N]/sup 2/ messages to join, and the memory overhead in each node is O(log N). Moreover, Tornado is able to provide comprehensive services with features scattered in different systems previously, and takes account of and exploits the heterogeneity in the underlying network environment.","PeriodicalId":177848,"journal":{"name":"Proceedings International Parallel and Distributed Processing Symposium","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129639622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Current trends in high performance parallel and distributed computing","authors":"V. Sunderam","doi":"10.1109/IPDPS.2003.1213452","DOIUrl":"https://doi.org/10.1109/IPDPS.2003.1213452","url":null,"abstract":"Summary form only given. Parallel computing for high performance scientific applications gained widespread adoption and deployment about two decades ago. Computer systems based on shared memory and message passing parallel architectures were soon followed by clusters and loosely coupled workstations, that afforded flexibility and good performance for many applications at a fractional cost of MPP. Such platforms, referred to as parallel distributed computing systems, have evolved considerably and are currently manifested as very sophisticated metacomputing and Grid systems. This paper traces the evolution of loosely coupled systems and highlights specific functional, as well as fundamental, differences between clusters and NOW of yesteryear versus metacomputing Grids of today. In particular, semantic differences between Grids and systems such as PVM and MPICH are explored. In addition, the recent trend in Grid frameworks to move away from conventional parallel programming models to a more service-oriented architecture is discussed. Exemplified by toolkits that follow the OGSA specification, these efforts attempt to unify aspects of Web-service technologies, high performance computing, and distributed systems in order to enable large scale, cross-domain sharing of compute, data, and service resources. The paper also presents specific examples of current metacomputing and Grid systems with respect to the above characteristics, and discusses the relative merits of different approaches.","PeriodicalId":177848,"journal":{"name":"Proceedings International Parallel and Distributed Processing Symposium","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128887352","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}