{"title":"An object oriented framework for an associative model of parallel computation","authors":"Michael Scherger, J. Potter, J. Baker","doi":"10.1109/IPDPS.2003.1213309","DOIUrl":"https://doi.org/10.1109/IPDPS.2003.1213309","url":null,"abstract":"An object oriented description and framework of the Multiple ASsociative Computing (MASC) model of parallel computation is presented. This description identifies MASC objects and specifies various object and inter-object relationships, dependencies, and behaviors. This was achieved by describing various views of the MASC model by using many of the UML structural and behavioral diagrams. This object oriented framework has been highly useful in designing an implementation of a runtime environment for the MASC model. Also the object oriented framework has been highly effective for further parallel modeling techniques, comparisons to other parallel models, MASC parallel system software research, and MASC algorithm development.","PeriodicalId":177848,"journal":{"name":"Proceedings International Parallel and Distributed Processing Symposium","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115811866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient agent-based multicast on wormhole switch-based irregular networks","authors":"Yi-Fang Lin, Pangfeng Liu","doi":"10.1109/IPDPS.2003.1213172","DOIUrl":"https://doi.org/10.1109/IPDPS.2003.1213172","url":null,"abstract":"This paper describes an agent-based approach for scheduling multiple multicast on wormhole switch-based networks. Multicast/broadcast is an important communication pattern, with applications in collective communication operations such as barrier synchronization and global combining. Our approach assigns an agent to each subtree of switches such that the agents can exchange information efficiently and independently. The entire multicast problem is then recursively solved with each agent sending message to those switches that it is responsible for. In this way, communication is localized by the assignment of agents to subtrees. This idea can be easily generalized to multiple multicast since the order of message passing among agents can be interleaved for different multicasts. We conduct experiments to demonstrate the efficiency of our approach by comparing the results with SPCCO, a highly efficient multicast algorithm. We found that SPCCO suffers link contention when the number of simultaneous multiple multicast becomes large. On the other hand, our agent-based approach achieves better performance in large cases.","PeriodicalId":177848,"journal":{"name":"Proceedings International Parallel and Distributed Processing Symposium","volume":"150 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132375055","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Phylogenetic tree inference on PC architectures with AxML/PAxML","authors":"A. Stamatakis, T. Ludwig","doi":"10.1109/IPDPS.2003.1213296","DOIUrl":"https://doi.org/10.1109/IPDPS.2003.1213296","url":null,"abstract":"Inference of phylogenetic trees comprising hundreds or even thousands of organisms based on the maximum likelihood method is computationally extremely expensive. In previous work, we have introduced subtree equality vectors (SEV) to significantly reduce the number of required floating point operations during topology evaluation and implemented this method in (P)AxML, which is a derivative of (parallel) fastDNAml. Experimental results show that (P)AxML scales particularly well on inexpensive PC-processor architectures obtaining global run time accelerations between 51% and 65% over (parallel) fastDNAml for large data sets, yet rendering exactly the same output. In this paper, we present an additional SEV-based algorithmic optimization which scales well on PC processors and leads to a further improvement of global execution times of 14% to 19% compared to the initial version of AxML. Furthermore, we present novel distance-based heuristics for reducing the number of analyzed tree topologies, which further accelerate the program by 4% up to 8%. Finally, we discuss a novel experimental tree-building algorithm and potential heuristic solutions for inferring large high quality trees, which for some initial tests rendered better trees and accelerated program execution at the same time by a factor greater than 6.","PeriodicalId":177848,"journal":{"name":"Proceedings International Parallel and Distributed Processing Symposium","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130256042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Extending OpenMP to support slipstream execution mode","authors":"K. Ibrahim, G. Byrd","doi":"10.1109/IPDPS.2003.1213119","DOIUrl":"https://doi.org/10.1109/IPDPS.2003.1213119","url":null,"abstract":"OpenMP has emerged as a widely accepted standard for writing shared memory programs. Hardware-specific extensions such as data placement are usually needed to improve the scalability of applications based on this standard. This paper investigates the implementation of an OpenMP compiler that supports slipstream execution mode, a new optimization mechanism for CMP-based distributed shared memory multiprocessors. Slipstream mode uses additional processors to reduce communication overhead, rather than to increase parallelism. We discuss how each OpenMP construct can be implemented to take advantage of slipstream mode, and we present a minor extension that allows runtime or compile-time control of slipstream execution. We also investigate the interaction between slipstream mechanisms and OpenMP scheduling. Our implementation supports both static and dynamic scheduling in slipstream mode. We extended the Omni OpenMP compiler to generate binaries that support slipstream mode, and we show the performance of slipstream-enabled codes using OpenMP codes from the NAS Parallel Benchmark suite, running on the SimOS simulator. Our extension to OpenMP allowed the benchmarks to achieve an average performance improvement of 14% with static scheduling. For dynamic scheduling the performance improvement is 12% on average.","PeriodicalId":177848,"journal":{"name":"Proceedings International Parallel and Distributed Processing Symposium","volume":"124 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134450228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A polymorphic hardware platform","authors":"P. Beckett","doi":"10.1109/IPDPS.2003.1213322","DOIUrl":"https://doi.org/10.1109/IPDPS.2003.1213322","url":null,"abstract":"In the domain of spatial computing, it appears that platforms based on either reconfigurable datapath units or on hybrid microprocessor/logic cell organizations are in the ascendancy as they appear to offer the most efficient means of providing resources across the greatest range of hardware designs. This paper encompasses an initial exploration of an alternative organization. It looks at the effect of using a very fine-grained approach based on a largely undifferentiated logic cell that can be configured to operate as a state element, logic or interconnect - or combinations of all three. A vertical layout style hides the overheads imposed by reconfigurability to an extent where very fine-grained organizations become a viable option. It is demonstrated that the technique can be used to develop building blocks for both synchronous and asynchronous circuits, supporting the development of hybrid architectures such as globally asynchronous, locally synchronous.","PeriodicalId":177848,"journal":{"name":"Proceedings International Parallel and Distributed Processing Symposium","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134157408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A characterisation of optimal channel assignments for wireless networks modelled as cellular and square grids","authors":"M. Shashanka, Amrita Pati, Anil M. Shende","doi":"10.1109/IPDPS.2003.1213406","DOIUrl":"https://doi.org/10.1109/IPDPS.2003.1213406","url":null,"abstract":"In this paper we first present a uniformity property that characterises optimal channel assignments for networks arranged as cellular or square grids. Then, we present optimal channel assignments for cellular and square grids; these assignments exhibit a high value for /spl delta//sub 1/ - the separation between channels assigned to adjacent stations. Based on empirical evidence, we conjecture that the value our assignments exhibit is an upper bound on /spl delta//sub 1/.","PeriodicalId":177848,"journal":{"name":"Proceedings International Parallel and Distributed Processing Symposium","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134158996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Some modular adders and multipliers for field programmable gate arrays","authors":"Jean-Luc Beuchat","doi":"10.1109/IPDPS.2003.1213353","DOIUrl":"https://doi.org/10.1109/IPDPS.2003.1213353","url":null,"abstract":"This paper is devoted to the study of number representations and algorithms leading to efficient implementations of modular adders and multipliers on recent field programmable arrays. Our hardware operators take advantage of the building blocks available in such devices: carry-propagate adders, memory blocks, and sometimes embedded multipliers. The first part of the paper describes three basic methodologies to carry out a modulo m addition and presents in more details the design of modulo (2/sup n/ /spl plusmn/ 1) adders. The major result is a novel modulo (2/sup n/ + 1) addition algorithm leading to an area-time efficient implementation of this arithmetic operation on FPGAs. The second part describes a modulo m multiplication algorithm involving small multipliers and memory blocks, and modulo (2/sup n/ + 1) multipliers based on Ma's algorithm. We also suggest some improvements of this operator in order to perform a multiplication in the group (Z*/sub 2n+1/,.).","PeriodicalId":177848,"journal":{"name":"Proceedings International Parallel and Distributed Processing Symposium","volume":"165 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130959933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Distributed hardware-in-the-loop simulator for autonomous continuous dynamical systems with spatially constrained interactions","authors":"Z. Papp, M. Dorrepaal, D. Verburg","doi":"10.1109/IPDPS.2003.1213235","DOIUrl":"https://doi.org/10.1109/IPDPS.2003.1213235","url":null,"abstract":"The state-of-the-art intelligent vehicle, autonomous guided vehicle and mobile robotics application domains can be described as collection of interacting highly autonomous complex dynamical systems. Extensive formal analysis of these systems - except special cases - is not feasible, consequently the availability of proper simulation and test tools is of primary importance. This research targets the real-time hardware-in-the-loop (HIL) simulation of vehicle and mobile robot systems. To certain extent distributed virtual environment (DYE) systems are attempting to satisfy similar requirements but a few distinctive features set this approach apart. DVE systems put the emphasis on load balancing and communication overhead. In our case the emphasis is on the temporal predictability and guaranteed, timed execution of the experiment. The paper describes a simulation framework dedicated to HIL simulation of continuous dynamical entities with spatially constrained interactions. The underlying modelling concept is introduced. The runtime infrastructure is described, which allows for distributed execution of the models.","PeriodicalId":177848,"journal":{"name":"Proceedings International Parallel and Distributed Processing Symposium","volume":"93 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133515000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Routing on meshes in optimum time and with really small queues","authors":"Bogdan S. Chlebus, J. F. Sibeyn","doi":"10.1109/IPDPS.2003.1213148","DOIUrl":"https://doi.org/10.1109/IPDPS.2003.1213148","url":null,"abstract":"We consider permutation routing problems on 2D and 3D mesh-connected computers with side length n. Our main result is a deterministic online algorithm routing on 2D meshes, operating in worst-case time T = 2n + /spl Oscr/(1) and with queue size Q = 3. We also develop offline routing algorithms with performance bounds T = 2n - 1 and Q = 2 for 2D meshes, and T = 3n - 2 and Q = 4 for 3D meshes. We also show that is it possible to route most of the permutations on 2D meshes offline in time T = 2n - 2 with Q = 1.","PeriodicalId":177848,"journal":{"name":"Proceedings International Parallel and Distributed Processing Symposium","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132145146","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dynamic organization schemes for cooperative proxy caching","authors":"S. Bakiras, Thanasis Loukopoulos, I. Ahmad","doi":"10.1109/IPDPS.2003.1213136","DOIUrl":"https://doi.org/10.1109/IPDPS.2003.1213136","url":null,"abstract":"In a generic cooperative caching architecture, web proxies form a mesh network. When a proxy cannot satisfy a request, it forwards the request to the other nodes of the mesh. Since a local cache cannot fulfill the majority of the arriving requests (typical values of the local hit ratio are about 30-50%), the volume of queries diverted to neighboring nodes can substantially grow and may consume considerable amount of system resources. A proxy does not need to cooperate with every node of the mesh due to the following reasons: (i) the traffic characteristics may be highly diverse; (ii) the contents of some nodes may extensively overlap; (iii) the inter-node distance might be too large. Furthermore, organizing N proxies in a mesh topology introduces scalability problems, since the number of queries is of the order of N/sup 2/. Therefore, restricting the number of neighbors for each proxy to k < N - 1 will likely lead to a balanced trade-off between query overhead and hit ratio, provided cooperation is done among useful neighbors. For a number of reasons the selection of useful neighbors is not efficient. An obvious reason is that web access patterns change dynamically. Furthermore, availability of proxies is not always globally known. This paper proposes a set of algorithms that enable proxies to independently explore the network and choose the k most beneficial (according to local criteria) neighbors in a dynamic fashion. The simulation experiments illustrate that the proposed dynamic neighbor reconfiguration schemes significantly reduce the overhead incurred by the mesh topology while yielding higher hit ratios compared to the static approach.","PeriodicalId":177848,"journal":{"name":"Proceedings International Parallel and Distributed Processing Symposium","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115901517","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}