{"title":"Register assignment for software pipelining with partitioned register banks","authors":"Jason Hiser, S. Carr, P. Sweany, S. Beaty","doi":"10.1109/IPDPS.2000.845983","DOIUrl":"https://doi.org/10.1109/IPDPS.2000.845983","url":null,"abstract":"Many techniques for increasing the amount of instruction-level parallelism (ILP) put increased pressure on the registers inside a CPU. These techniques allow for more operations to occur simultaneously at the cost of requiring more registers to hold the operands and results of those operations, and importantly, more ports on the register banks to allow for concurrent access to the data. One approach of ameliorating the number of ports on a register bank (the cost of ports in gates varies as N/sup 2/ where N is the number of ports, and adding ports increases access time) is to have multiple register banks with fewer ports, each attached to a subset of the available functional units. This reduces the number of ports needed on a per-bank basis, but can slow operations if a necessary value is not in an attached register bank as copy operations must be inserted. Therefore, there is a circular dependence between assigning operations to functional units and assigning values to register banks. We describe an approach that produces good code by separating partitioning from scheduling and register assignment. Our method is independent of both the scheduling technique and register assignment method used.","PeriodicalId":206541,"journal":{"name":"Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115503203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Andreas I. Svolos, C. Konstantopoulos, C. Kaklamanis
{"title":"Efficient binary morphological algorithms on a massively parallel processor","authors":"Andreas I. Svolos, C. Konstantopoulos, C. Kaklamanis","doi":"10.1109/IPDPS.2000.845997","DOIUrl":"https://doi.org/10.1109/IPDPS.2000.845997","url":null,"abstract":"One of the most important features in image analysis and understanding is shape. Mathematical morphology is the image processing branch that deals with shape analysis. The definition of all morphological transformations is based on two primitive operations, i.e. dilation and erosion. Since many applications require the solution of morphological problems in real time, researching time efficient algorithms for these two operations is crucial. In this paper efficient parallel algorithms for the binary dilation and erosion are presented and evaluated for an advanced associative processor. Simulation results indicate that the achieved speedup is linear.","PeriodicalId":206541,"journal":{"name":"Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117197482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"High performance parametric modeling with Nimrod/G: killer application for the global grid?","authors":"D. Abramson, J. Giddy, Lew Kotler","doi":"10.1109/IPDPS.2000.846030","DOIUrl":"https://doi.org/10.1109/IPDPS.2000.846030","url":null,"abstract":"This paper examines the role of parametric modeling as an application for the global computing grid, and explores some heuristics which make it possible to specific soft real time deadlines for larger computational experiments. We demonstrate the scheme with a case study utilizing the Globus toolkit running on the GUSTO testbed.","PeriodicalId":206541,"journal":{"name":"Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116125771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A decision-process analysis of implicit coscheduling","authors":"R. Poovendran, P. Keleher, J. Baras","doi":"10.1109/IPDPS.2000.845972","DOIUrl":"https://doi.org/10.1109/IPDPS.2000.845972","url":null,"abstract":"This paper presents a theoretical framework based on Bayesian decision theory for analyzing recently reported results on implicit coscheduling of parallel applications on clusters of workstations. Using probabilistic modeling, We show that the approach presented can be applied for processes with arbitrary communication mixes. We also note that our approach can be used for deciding the additional spin times in the case of spin-yield. Finally, we present arguments for the use of a different notion of fairness than assumed by prior work.","PeriodicalId":206541,"journal":{"name":"Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116361142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Safe caching in a distributed file system for network attached storage","authors":"R. Burns, R. Rees, D. Long","doi":"10.1109/IPDPS.2000.845977","DOIUrl":"https://doi.org/10.1109/IPDPS.2000.845977","url":null,"abstract":"In a distributed file system built on network attached storage, client computers access data directly from shared storage, rather than submitting I/O requests through a server. Without a server marshaling access to data, if a computer fails or becomes isolated in a network partition while holding locks on cached data objects, those objects become inaccessible to other computers until a locking authority can guarantee that the lock holder will not again directly access these data. We describe a server that acts as the locking authority and implements a lease-based protocol for revoking access to data objects locked by an isolated or failed computer. When a lease expires, the server can be assured that the client no longer acts on locked data, and can safely redistribute locks to other clients. During normal operation, this protocol invokes no message overhead, and uses no memory and performs no computation at the locking authority.","PeriodicalId":206541,"journal":{"name":"Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122453883","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Thomas Fahringer, A. Pozgaj, J. Luitz, H. Moritsch
{"title":"Evaluation of P/sup 3/T+: a performance estimator for distributed and parallel applications","authors":"Thomas Fahringer, A. Pozgaj, J. Luitz, H. Moritsch","doi":"10.1109/IPDPS.2000.845989","DOIUrl":"https://doi.org/10.1109/IPDPS.2000.845989","url":null,"abstract":"In this paper, we report on experiences with P/sup 3/T+, a performance estimator for distributed and parallel programs which is used to examine at compile time the performance outcome of changes in code, problem and machine sizes, and target architectures. P/sup 3/T+ computes a variety of performance parameters including work distribution, number of transfers, amount of data transferred, transfer times, computation times, and number of cache misses. It is unique in that it models programs, code transformations and parallel and distributed architectures and derives a performance prediction based on all three of these elements. P/sup 3/T+ is the successor tool of P/sup 3/T which computed a similar set of performance parameters, however for parallel programs only. P/sup 3/T+ has been re-designed and re-implemented from scratch and goes beyond P/sup 3/T by extending the class of programs that cart be handled and by employing several novel estimation methods (symbolic analysis, simulation, pre-measured kernel codes, etc.). The core part of this paper reports on the evaluation of P/sup 3/T+ to demonstrate both accuracy and usefulness of this tool for realistic kernel codes taken from real-world applications (pricing of financial derivatives and quantum mechanical calculations of solids).","PeriodicalId":206541,"journal":{"name":"Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121919150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improving routing performance in Myrinet networks","authors":"J. Flich, Manuel P. Malumbres, P. López, J. Duato","doi":"10.1109/IPDPS.2000.845961","DOIUrl":"https://doi.org/10.1109/IPDPS.2000.845961","url":null,"abstract":"Networks of workstations (NOWs) are becoming increasingly popular as a cost-effective alternative to parallel computers. Typically, these networks connect processors using irregular topologies, providing the wiring flexibility, scalability, and incremental expansion capability required in this environment. In some of these networks, packets are delivered using source routing. Due to the irregular topology, the routing scheme is often non-minimal. In this paper we analyze the routing scheme used in Myrinet networks in order to improve its performance. We propose new routing algorithms that balance the utilization of the available routes and always use minimal paths. We show through simulation that the current routing schemes used in Myrinet networks can be improved by modifying only the routing software without increasing the software overhead significantly. The overall throughput can be doubled without modifying the network hardware.","PeriodicalId":206541,"journal":{"name":"Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123438554","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bandwidth-efficient collective communication for clustered wide area systems","authors":"T. Kielmann, H. Bal, S. Gorlatch","doi":"10.1109/IPDPS.2000.846026","DOIUrl":"https://doi.org/10.1109/IPDPS.2000.846026","url":null,"abstract":"Metacomputing infrastructures couple multiple clusters (or MPPs) via wide-area networks. A major problem in programming parallel applications for such platforms is their hierarchical network structure: latency and bandwidth of WANs often are orders of magnitude worse than those of local networks. Our goal is to optimize MPI's collective operations for such platforms. In this paper we focus on optimized utilization of the (scarce) wide-area bandwidth. We use two techniques: selecting suitable communication graph shapes, and splitting messages into multiple segments that are sent in parallel over different WAN links. To determine the best graph shape and segment size, we introduce a performance model called parameterized LogP (P-LogP), a hierarchical extension of the LogP model that covers messages of arbitrary length. With P-LogP, the optimal segment size and the best broadcast tree shape can be determined at runtime. (For conciseness, we restrict our discussion to the broadcast operation). An experimental performance evaluation shows that the new broadcast has significantly improved performance (for large messages) and that there is a close match between the theoretical model and the measured completion times.","PeriodicalId":206541,"journal":{"name":"Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127800871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deterministic replay of distributed Java applications","authors":"Ravi B. Konuru, H. Srinivasan, Jong-Deok Choi","doi":"10.1109/IPDPS.2000.845988","DOIUrl":"https://doi.org/10.1109/IPDPS.2000.845988","url":null,"abstract":"Execution behavior of a Java application can be nondeterministic due to concurrent threads of execution, thread scheduling, and variable network delays. This nondeterminism in Java makes the understanding and debugging of multi-threaded distributed Java applications a difficult and a laborious process. It is well accepted that providing deterministic replay of application execution is a key step towards programmer productivity and program under-standing. Towards this goal, we developed a replay framework based on logical thread schedules and logical intervals. An application of this framework was previously published in the context of a system called Deja Vu that provides deterministic replay of multi-threaded Java programs on a single Java Virtual Machine (JVM). In contrast, this paper focuses on distributed Deja Vu that provides deterministic replay of distributed Java applications running on multiple JVMs. We describe the issues and present the design, implementation and preliminary performance results of distributed Deja Vu that supports both multi-threaded and distributed Java applications.","PeriodicalId":206541,"journal":{"name":"Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131982064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Banikazemi, V. Moorthy, D. Panda, L. Herger, B. Abali
{"title":"Efficient virtual interface architecture (VIA) support for the IBM SP switch-connected NT clusters","authors":"M. Banikazemi, V. Moorthy, D. Panda, L. Herger, B. Abali","doi":"10.1109/IPDPS.2000.845962","DOIUrl":"https://doi.org/10.1109/IPDPS.2000.845962","url":null,"abstract":"The IBM SP Switch-Connected NT cluster is one the newest clustering platforms available. In this paper, we discuss an experimental implementation of the Virtual Interface Architecture for this platform. We discuss different design issues involved in this implementation. In particular, we explain how the virtual-to-physical address translation can be implemented efficiently with a minimum Network Interface Card (NIC) memory requirement. We show how caching the VIA descriptors on the NIC can reduce the communication latency. We also present an efficient scheme for implementing the VIA door bells without any hardware support. A comprehensive performance evaluation study of the implementation is provided. The performance of the implemented VIA surpasses that of other existing software implementations of the VIA and is comparable to that of a hardware VIA implementation. The peak measured bandwidth for our system is observed to be 101.4 MBytes/s and the one-way latency for short messages is 18.2 microseconds. It is to be noted that the VIA implementation presented in this paper is not a part of any IBM product and no assumptions should be made regarding its availability as a product in the future.","PeriodicalId":206541,"journal":{"name":"Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000","volume":"142 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132075231","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}