{"title":"Natural block data decomposition for heterogeneous clusters","authors":"Egor Dovolnov, A. Kalinov, S. Klimov","doi":"10.1109/IPDPS.2003.1213209","DOIUrl":"https://doi.org/10.1109/IPDPS.2003.1213209","url":null,"abstract":"We propose general purposes natural heuristics for static block and block-cyclic heterogeneous data decomposition over processes of parallel program mapped into multidimensional grid. This heuristics is an extension of the intuitively clear heterogeneous data distribution for one-dimensional case. It is compared to advanced heuristics for heterogeneous data decomposition proposed for solving linear algebra problems on two-dimensional process grid. We experimentally show that for typical local network (12 Windows 2000 PCs interconnected via Fast Ethernet switch) and for typical linear algebra problems these two heuristics have almost the same efficiency. We demonstrate efficiency of the proposed natural decomposition for case of three-dimensional process grid on the example of 3D modeling of supernova explosion.","PeriodicalId":177848,"journal":{"name":"Proceedings International Parallel and Distributed Processing Symposium","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123918522","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An extended link reversal protocol in dynamic networks","authors":"Jie Wu, Fei Dai","doi":"10.1109/IPDPS.2003.1213173","DOIUrl":"https://doi.org/10.1109/IPDPS.2003.1213173","url":null,"abstract":"We consider the problem of maintaining routing paths between nodes in a dynamic network. Gafni and Bertsekas proposed a link reversal approach called the BG method that maintains a directed acyclic graph (DAG) with a given destination as the sink node. By virtue of built-in redundancy, an updating algorithm to establish a new DAG is activated infrequently and it happens only when the last outgoing link of a host in the DAG is destroyed due to the movement of nodes. In this paper, we propose another updating approach that tries to minimize the total number of reversed links and to maintain routing information without using much extra overhead. The approach maintains a reversed breadth-first tree. Nodes in the network are either marked (inside the tree) or unmarked (outside the tree). When it is too costly to maintain a minimum path for a marked node, the branch rooted at the node is trimmed and the approach then gracefully switches to the BG method. Several extensions are also discussed. A simulation study is conducted to compare the performance of the proposed approach with the existing one.","PeriodicalId":177848,"journal":{"name":"Proceedings International Parallel and Distributed Processing Symposium","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114371721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Arindam Mitra, Muthucumaru Maheswaran, J. A. Rueda
{"title":"Wide-area content-based routing mechanism","authors":"Arindam Mitra, Muthucumaru Maheswaran, J. A. Rueda","doi":"10.1109/IPDPS.2003.1213447","DOIUrl":"https://doi.org/10.1109/IPDPS.2003.1213447","url":null,"abstract":"Content networking is an emerging technology, where the requests for content are steered by content routers that examine not only the destinations but also content descriptors such as URL and cookies. In the current deployments of content networking, content routing is mostly confined to selecting the most appropriate back-end server in virtualized Web server clusters. In this paper, we present an architecture for wide-area content routing. The architecture is based on tagging the requests at ingress points. The tags are designed to incorporate several different attributes of the content in the routing process. Simulations are carried out to compare the performance of the proposed scheme with a DNS-based content access scheme.","PeriodicalId":177848,"journal":{"name":"Proceedings International Parallel and Distributed Processing Symposium","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114940281","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the appropriateness of commodity operating systems for large-scale, balanced computing systems","authors":"R. Brightwell, A. Maccabe, R. Riesen","doi":"10.1109/IPDPS.2003.1213164","DOIUrl":"https://doi.org/10.1109/IPDPS.2003.1213164","url":null,"abstract":"In the past five years (1997-2002), we have been involved in the design and development of Cplant/spl trade/. An important goal was to take advantage of commodity approaches wherever possible. In particular, we selected Linux, a commonly available operating system, for the compute nodes of Cplant/spl trade/. While the use of commodity solutions, including Linux, was critical to the success of Cplant/spl trade/, we believe that such an approach will not be viable in the development of the next generation of very large-scale systems. We present our definition of a balanced system and discuss several limitations of commodity operating systems in the context of balanced systems. These limitations are categorized into technical limitations (e.g., the structure of the virtual memory system) and social limitations (e.g., the kernel development process). While our direct experience is based on Linux, issues we have identified should be relevant to all commodity operating systems.","PeriodicalId":177848,"journal":{"name":"Proceedings International Parallel and Distributed Processing Symposium","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117012215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Cache pollution in Web proxy servers","authors":"R. Ayani, Y. M. Teo, Yean Seen Ng","doi":"10.1109/IPDPS.2003.1213450","DOIUrl":"https://doi.org/10.1109/IPDPS.2003.1213450","url":null,"abstract":"Caching has been used for decades as an effective performance enhancing technique in computer systems. The Least Recently Used (LRU) cache replacement algorithm is a simple and widely used scheme. Proxy caching is a common approach to reduce network traffic and delay in many World Wide Web (WWW) applications. However, some characteristics of WWW workloads make LRU less attractive in proxy caching. In the recent years, several more efficient replacement algorithms have been suggested. But, these advanced algorithms require a lot of knowledge about the workloads and are generally difficult to implement. The main attraction of LRU is its simplicity. In this paper we present two modified LRU algorithms and compare their performance with the LRU. Our results indicate that the performance of the LRU algorithm can be improved substantially with very simple modifications.","PeriodicalId":177848,"journal":{"name":"Proceedings International Parallel and Distributed Processing Symposium","volume":"77 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123229775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Time series forecasting using massively parallel genetic programming","authors":"S. Eklund","doi":"10.1109/IPDPS.2003.1213272","DOIUrl":"https://doi.org/10.1109/IPDPS.2003.1213272","url":null,"abstract":"In this paper we propose a massively parallel GP model in hardware as an efficient, flexible and scaleable machine learning system. This fine-grained diffusion architecture consists of a large amount of independent processing nodes that evolve a large number of small, overlapping subpopulations. Every node has an embedded CPU that executes a linear machine code GP representation at a rate of up to 20,000 generations per second. Besides being efficient, implementing the system in VLSI makes it highly portable and makes it possible to target mobile, on-line applications. The SIMD-like architecture also makes the system scalable so that larger problems can be addressed with a system with more processing nodes. Finally, the use of GP representation and VHDL modeling makes the system highly flexible and easy to adapt to different applications. We demonstrate the effectiveness of the system on a time series forecasting application.","PeriodicalId":177848,"journal":{"name":"Proceedings International Parallel and Distributed Processing Symposium","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122011864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Founding FireWire bridges through Promela prototyping","authors":"Izak van Langevelde, J. Romijn, N. Goga","doi":"10.1109/IPDPS.2003.1213434","DOIUrl":"https://doi.org/10.1109/IPDPS.2003.1213434","url":null,"abstract":"The standardisation procedure of the IEEE P1394.1 Draft Standard for High Performance Serial Bus Bridges is supported through the use of the state-of-the-art model checker Spin, which has been used to simulate the complex net update procedure of the standard, and the use of which will eventually be refined to obtain a solid model checking analysis of the standard. A concise description of net updates is formalised in terms of spanning trees, and it is shown how Spin was used to track down errors in the standard and to gather support for the solutions proposed.","PeriodicalId":177848,"journal":{"name":"Proceedings International Parallel and Distributed Processing Symposium","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122043779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fast and lock-free concurrent priority queues for multi-thread systems","authors":"Håkan Sundell, P. Tsigas","doi":"10.1109/IPDPS.2003.1213189","DOIUrl":"https://doi.org/10.1109/IPDPS.2003.1213189","url":null,"abstract":"We present an efficient and practical lock-free implementation of a concurrent priority queue that is suitable for both fully concurrent (large multi-processor) systems as well as pre-emptive (multi-process) systems. Many algorithms for concurrent priority queues are based on mutual exclusion. However, mutual exclusion causes blocking which has several drawbacks and degrades the system's overall performance. Non-blocking algorithms avoid blocking, and are either lock-free or wait-free. Previously known non-blocking algorithms of priority queues did not perform well in practice because of their complexity, and they are often based on non-available atomic synchronization primitives. Our algorithm is based on the randomized sequential list structure called Skiplist, and a real-time extension of our algorithm is also described. In our performance evaluation we compare our algorithm with some of the most efficient implementations of priority queues known. The experimental results clearly show that our lock-free implementation outperforms the other lock-based implementations in all cases for 3 threads and more, both on fully concurrent as well as on pre-emptive systems.","PeriodicalId":177848,"journal":{"name":"Proceedings International Parallel and Distributed Processing Symposium","volume":"376 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122074674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SMP-aware message passing programming","authors":"J. L. Traff","doi":"10.1109/IPDPS.2003.1213253","DOIUrl":"https://doi.org/10.1109/IPDPS.2003.1213253","url":null,"abstract":"The message passing interface (MPI) is designed as an architecture independent interface for parallel programming in the shared-nothing, message passing paradigm. We briefly summarize basic requirements to a high-quality implementation of MPI for efficient programming of SMP clusters and related architectures, and discuss possible, mild extensions of the topology functionality of MPI, which, while retaining a high degree of architecture independence, can make MPI more useful and efficient for message-passing programming of SMP clusters. We show that the discussed extensions can all be implemented on top of MPI with very little environmental support.","PeriodicalId":177848,"journal":{"name":"Proceedings International Parallel and Distributed Processing Symposium","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124125682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An implicitly parallel object-oriented matrix library and its application to medical physics","authors":"Jonas Lätt, B. Chopard","doi":"10.1109/IPDPS.2003.1213460","DOIUrl":"https://doi.org/10.1109/IPDPS.2003.1213460","url":null,"abstract":"We introduce VLADYMIR, a matrix library that permits the development of array-based code in C++. It is especially useful for numerical simulation tasks and parallelises automatically, without any need for parallelisation-specific instructions. Thanks to the underlying data-parallel model, it shows up an excellent scalability, even on large parallel machines. VLADYMIR has been successfully tested so far on simulations of physical systems by cellular automata and lattice Boltzmann models. In this paper, an application to emboli detection in medical physics is presented. Occurrence of emboli in the brain presents a risk for cardiovascular accidents, which one tries to avoid by ultrasonography methods. Simulations using lattice Boltzmann methods help to analyse the interaction of an embolus with ultrasounds and lead towards a better understanding of the structure of the signal in ultrasonography measurements.","PeriodicalId":177848,"journal":{"name":"Proceedings International Parallel and Distributed Processing Symposium","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124427356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}