Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000最新文献

筛选
英文 中文
Register assignment for software pipelining with partitioned register banks 带有分区寄存器组的软件流水线的寄存器分配
Jason Hiser, S. Carr, P. Sweany, S. Beaty
{"title":"Register assignment for software pipelining with partitioned register banks","authors":"Jason Hiser, S. Carr, P. Sweany, S. Beaty","doi":"10.1109/IPDPS.2000.845983","DOIUrl":"https://doi.org/10.1109/IPDPS.2000.845983","url":null,"abstract":"Many techniques for increasing the amount of instruction-level parallelism (ILP) put increased pressure on the registers inside a CPU. These techniques allow for more operations to occur simultaneously at the cost of requiring more registers to hold the operands and results of those operations, and importantly, more ports on the register banks to allow for concurrent access to the data. One approach of ameliorating the number of ports on a register bank (the cost of ports in gates varies as N/sup 2/ where N is the number of ports, and adding ports increases access time) is to have multiple register banks with fewer ports, each attached to a subset of the available functional units. This reduces the number of ports needed on a per-bank basis, but can slow operations if a necessary value is not in an attached register bank as copy operations must be inserted. Therefore, there is a circular dependence between assigning operations to functional units and assigning values to register banks. We describe an approach that produces good code by separating partitioning from scheduling and register assignment. Our method is independent of both the scheduling technique and register assignment method used.","PeriodicalId":206541,"journal":{"name":"Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115503203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Efficient binary morphological algorithms on a massively parallel processor 大规模并行处理器上的高效二进制形态算法
Andreas I. Svolos, C. Konstantopoulos, C. Kaklamanis
{"title":"Efficient binary morphological algorithms on a massively parallel processor","authors":"Andreas I. Svolos, C. Konstantopoulos, C. Kaklamanis","doi":"10.1109/IPDPS.2000.845997","DOIUrl":"https://doi.org/10.1109/IPDPS.2000.845997","url":null,"abstract":"One of the most important features in image analysis and understanding is shape. Mathematical morphology is the image processing branch that deals with shape analysis. The definition of all morphological transformations is based on two primitive operations, i.e. dilation and erosion. Since many applications require the solution of morphological problems in real time, researching time efficient algorithms for these two operations is crucial. In this paper efficient parallel algorithms for the binary dilation and erosion are presented and evaluated for an advanced associative processor. Simulation results indicate that the achieved speedup is linear.","PeriodicalId":206541,"journal":{"name":"Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117197482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
High performance parametric modeling with Nimrod/G: killer application for the global grid? Nimrod/G的高性能参数化建模:全球网格的杀手级应用?
D. Abramson, J. Giddy, Lew Kotler
{"title":"High performance parametric modeling with Nimrod/G: killer application for the global grid?","authors":"D. Abramson, J. Giddy, Lew Kotler","doi":"10.1109/IPDPS.2000.846030","DOIUrl":"https://doi.org/10.1109/IPDPS.2000.846030","url":null,"abstract":"This paper examines the role of parametric modeling as an application for the global computing grid, and explores some heuristics which make it possible to specific soft real time deadlines for larger computational experiments. We demonstrate the scheme with a case study utilizing the Globus toolkit running on the GUSTO testbed.","PeriodicalId":206541,"journal":{"name":"Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116125771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 587
A decision-process analysis of implicit coscheduling 隐式协同调度的决策过程分析
R. Poovendran, P. Keleher, J. Baras
{"title":"A decision-process analysis of implicit coscheduling","authors":"R. Poovendran, P. Keleher, J. Baras","doi":"10.1109/IPDPS.2000.845972","DOIUrl":"https://doi.org/10.1109/IPDPS.2000.845972","url":null,"abstract":"This paper presents a theoretical framework based on Bayesian decision theory for analyzing recently reported results on implicit coscheduling of parallel applications on clusters of workstations. Using probabilistic modeling, We show that the approach presented can be applied for processes with arbitrary communication mixes. We also note that our approach can be used for deciding the additional spin times in the case of spin-yield. Finally, we present arguments for the use of a different notion of fairness than assumed by prior work.","PeriodicalId":206541,"journal":{"name":"Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116361142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Safe caching in a distributed file system for network attached storage 用于网络附加存储的分布式文件系统中的安全缓存
R. Burns, R. Rees, D. Long
{"title":"Safe caching in a distributed file system for network attached storage","authors":"R. Burns, R. Rees, D. Long","doi":"10.1109/IPDPS.2000.845977","DOIUrl":"https://doi.org/10.1109/IPDPS.2000.845977","url":null,"abstract":"In a distributed file system built on network attached storage, client computers access data directly from shared storage, rather than submitting I/O requests through a server. Without a server marshaling access to data, if a computer fails or becomes isolated in a network partition while holding locks on cached data objects, those objects become inaccessible to other computers until a locking authority can guarantee that the lock holder will not again directly access these data. We describe a server that acts as the locking authority and implements a lease-based protocol for revoking access to data objects locked by an isolated or failed computer. When a lease expires, the server can be assured that the client no longer acts on locked data, and can safely redistribute locks to other clients. During normal operation, this protocol invokes no message overhead, and uses no memory and performs no computation at the locking authority.","PeriodicalId":206541,"journal":{"name":"Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122453883","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Evaluation of P/sup 3/T+: a performance estimator for distributed and parallel applications P/sup 3/T+的评估:分布式和并行应用程序的性能评估器
Thomas Fahringer, A. Pozgaj, J. Luitz, H. Moritsch
{"title":"Evaluation of P/sup 3/T+: a performance estimator for distributed and parallel applications","authors":"Thomas Fahringer, A. Pozgaj, J. Luitz, H. Moritsch","doi":"10.1109/IPDPS.2000.845989","DOIUrl":"https://doi.org/10.1109/IPDPS.2000.845989","url":null,"abstract":"In this paper, we report on experiences with P/sup 3/T+, a performance estimator for distributed and parallel programs which is used to examine at compile time the performance outcome of changes in code, problem and machine sizes, and target architectures. P/sup 3/T+ computes a variety of performance parameters including work distribution, number of transfers, amount of data transferred, transfer times, computation times, and number of cache misses. It is unique in that it models programs, code transformations and parallel and distributed architectures and derives a performance prediction based on all three of these elements. P/sup 3/T+ is the successor tool of P/sup 3/T which computed a similar set of performance parameters, however for parallel programs only. P/sup 3/T+ has been re-designed and re-implemented from scratch and goes beyond P/sup 3/T by extending the class of programs that cart be handled and by employing several novel estimation methods (symbolic analysis, simulation, pre-measured kernel codes, etc.). The core part of this paper reports on the evaluation of P/sup 3/T+ to demonstrate both accuracy and usefulness of this tool for realistic kernel codes taken from real-world applications (pricing of financial derivatives and quantum mechanical calculations of solids).","PeriodicalId":206541,"journal":{"name":"Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121919150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Improving routing performance in Myrinet networks 改进Myrinet网络的路由性能
J. Flich, Manuel P. Malumbres, P. López, J. Duato
{"title":"Improving routing performance in Myrinet networks","authors":"J. Flich, Manuel P. Malumbres, P. López, J. Duato","doi":"10.1109/IPDPS.2000.845961","DOIUrl":"https://doi.org/10.1109/IPDPS.2000.845961","url":null,"abstract":"Networks of workstations (NOWs) are becoming increasingly popular as a cost-effective alternative to parallel computers. Typically, these networks connect processors using irregular topologies, providing the wiring flexibility, scalability, and incremental expansion capability required in this environment. In some of these networks, packets are delivered using source routing. Due to the irregular topology, the routing scheme is often non-minimal. In this paper we analyze the routing scheme used in Myrinet networks in order to improve its performance. We propose new routing algorithms that balance the utilization of the available routes and always use minimal paths. We show through simulation that the current routing schemes used in Myrinet networks can be improved by modifying only the routing software without increasing the software overhead significantly. The overall throughput can be doubled without modifying the network hardware.","PeriodicalId":206541,"journal":{"name":"Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123438554","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 48
Bandwidth-efficient collective communication for clustered wide area systems 集群广域系统的带宽高效集体通信
T. Kielmann, H. Bal, S. Gorlatch
{"title":"Bandwidth-efficient collective communication for clustered wide area systems","authors":"T. Kielmann, H. Bal, S. Gorlatch","doi":"10.1109/IPDPS.2000.846026","DOIUrl":"https://doi.org/10.1109/IPDPS.2000.846026","url":null,"abstract":"Metacomputing infrastructures couple multiple clusters (or MPPs) via wide-area networks. A major problem in programming parallel applications for such platforms is their hierarchical network structure: latency and bandwidth of WANs often are orders of magnitude worse than those of local networks. Our goal is to optimize MPI's collective operations for such platforms. In this paper we focus on optimized utilization of the (scarce) wide-area bandwidth. We use two techniques: selecting suitable communication graph shapes, and splitting messages into multiple segments that are sent in parallel over different WAN links. To determine the best graph shape and segment size, we introduce a performance model called parameterized LogP (P-LogP), a hierarchical extension of the LogP model that covers messages of arbitrary length. With P-LogP, the optimal segment size and the best broadcast tree shape can be determined at runtime. (For conciseness, we restrict our discussion to the broadcast operation). An experimental performance evaluation shows that the new broadcast has significantly improved performance (for large messages) and that there is a close match between the theoretical model and the measured completion times.","PeriodicalId":206541,"journal":{"name":"Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127800871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 89
Deterministic replay of distributed Java applications 分布式Java应用程序的确定性重放
Ravi B. Konuru, H. Srinivasan, Jong-Deok Choi
{"title":"Deterministic replay of distributed Java applications","authors":"Ravi B. Konuru, H. Srinivasan, Jong-Deok Choi","doi":"10.1109/IPDPS.2000.845988","DOIUrl":"https://doi.org/10.1109/IPDPS.2000.845988","url":null,"abstract":"Execution behavior of a Java application can be nondeterministic due to concurrent threads of execution, thread scheduling, and variable network delays. This nondeterminism in Java makes the understanding and debugging of multi-threaded distributed Java applications a difficult and a laborious process. It is well accepted that providing deterministic replay of application execution is a key step towards programmer productivity and program under-standing. Towards this goal, we developed a replay framework based on logical thread schedules and logical intervals. An application of this framework was previously published in the context of a system called Deja Vu that provides deterministic replay of multi-threaded Java programs on a single Java Virtual Machine (JVM). In contrast, this paper focuses on distributed Deja Vu that provides deterministic replay of distributed Java applications running on multiple JVMs. We describe the issues and present the design, implementation and preliminary performance results of distributed Deja Vu that supports both multi-threaded and distributed Java applications.","PeriodicalId":206541,"journal":{"name":"Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131982064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 92
Efficient virtual interface architecture (VIA) support for the IBM SP switch-connected NT clusters 对IBM SP交换机连接的NT集群的高效虚拟接口体系结构(VIA)支持
M. Banikazemi, V. Moorthy, D. Panda, L. Herger, B. Abali
{"title":"Efficient virtual interface architecture (VIA) support for the IBM SP switch-connected NT clusters","authors":"M. Banikazemi, V. Moorthy, D. Panda, L. Herger, B. Abali","doi":"10.1109/IPDPS.2000.845962","DOIUrl":"https://doi.org/10.1109/IPDPS.2000.845962","url":null,"abstract":"The IBM SP Switch-Connected NT cluster is one the newest clustering platforms available. In this paper, we discuss an experimental implementation of the Virtual Interface Architecture for this platform. We discuss different design issues involved in this implementation. In particular, we explain how the virtual-to-physical address translation can be implemented efficiently with a minimum Network Interface Card (NIC) memory requirement. We show how caching the VIA descriptors on the NIC can reduce the communication latency. We also present an efficient scheme for implementing the VIA door bells without any hardware support. A comprehensive performance evaluation study of the implementation is provided. The performance of the implemented VIA surpasses that of other existing software implementations of the VIA and is comparable to that of a hardware VIA implementation. The peak measured bandwidth for our system is observed to be 101.4 MBytes/s and the one-way latency for short messages is 18.2 microseconds. It is to be noted that the VIA implementation presented in this paper is not a part of any IBM product and no assumptions should be made regarding its availability as a product in the future.","PeriodicalId":206541,"journal":{"name":"Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000","volume":"142 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132075231","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信