Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000最新文献_第10页

Register assignment for software pipelining with partitioned register banks 带有分区寄存器组的软件流水线的寄存器分配

Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000 Pub Date : 2000-05-01 DOI: 10.1109/IPDPS.2000.845983

Jason Hiser, S. Carr, P. Sweany, S. Beaty

{"title":"Register assignment for software pipelining with partitioned register banks","authors":"Jason Hiser, S. Carr, P. Sweany, S. Beaty","doi":"10.1109/IPDPS.2000.845983","DOIUrl":"https://doi.org/10.1109/IPDPS.2000.845983","url":null,"abstract":"Many techniques for increasing the amount of instruction-level parallelism (ILP) put increased pressure on the registers inside a CPU. These techniques allow for more operations to occur simultaneously at the cost of requiring more registers to hold the operands and results of those operations, and importantly, more ports on the register banks to allow for concurrent access to the data. One approach of ameliorating the number of ports on a register bank (the cost of ports in gates varies as N/sup 2/ where N is the number of ports, and adding ports increases access time) is to have multiple register banks with fewer ports, each attached to a subset of the available functional units. This reduces the number of ports needed on a per-bank basis, but can slow operations if a necessary value is not in an attached register bank as copy operations must be inserted. Therefore, there is a circular dependence between assigning operations to functional units and assigning values to register banks. We describe an approach that produces good code by separating partitioning from scheduling and register assignment. Our method is independent of both the scheduling technique and register assignment method used.","PeriodicalId":206541,"journal":{"name":"Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115503203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

Efficient binary morphological algorithms on a massively parallel processor 大规模并行处理器上的高效二进制形态算法

Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000 Pub Date : 2000-05-01 DOI: 10.1109/IPDPS.2000.845997

Andreas I. Svolos, C. Konstantopoulos, C. Kaklamanis

引用次数: 8

High performance parametric modeling with Nimrod/G: killer application for the global grid? Nimrod/G的高性能参数化建模:全球网格的杀手级应用?

Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000 Pub Date : 2000-05-01 DOI: 10.1109/IPDPS.2000.846030

D. Abramson, J. Giddy, Lew Kotler

引用次数: 587

A decision-process analysis of implicit coscheduling 隐式协同调度的决策过程分析

Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000 Pub Date : 2000-05-01 DOI: 10.1109/IPDPS.2000.845972

R. Poovendran, P. Keleher, J. Baras

引用次数: 3

Safe caching in a distributed file system for network attached storage 用于网络附加存储的分布式文件系统中的安全缓存

Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000 Pub Date : 2000-05-01 DOI: 10.1109/IPDPS.2000.845977

R. Burns, R. Rees, D. Long

引用次数: 15

Evaluation of P/sup 3/T+: a performance estimator for distributed and parallel applications P/sup 3/T+的评估:分布式和并行应用程序的性能评估器

Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000 Pub Date : 2000-05-01 DOI: 10.1109/IPDPS.2000.845989

Thomas Fahringer, A. Pozgaj, J. Luitz, H. Moritsch

{"title":"Evaluation of P/sup 3/T+: a performance estimator for distributed and parallel applications","authors":"Thomas Fahringer, A. Pozgaj, J. Luitz, H. Moritsch","doi":"10.1109/IPDPS.2000.845989","DOIUrl":"https://doi.org/10.1109/IPDPS.2000.845989","url":null,"abstract":"In this paper, we report on experiences with P/sup 3/T+, a performance estimator for distributed and parallel programs which is used to examine at compile time the performance outcome of changes in code, problem and machine sizes, and target architectures. P/sup 3/T+ computes a variety of performance parameters including work distribution, number of transfers, amount of data transferred, transfer times, computation times, and number of cache misses. It is unique in that it models programs, code transformations and parallel and distributed architectures and derives a performance prediction based on all three of these elements. P/sup 3/T+ is the successor tool of P/sup 3/T which computed a similar set of performance parameters, however for parallel programs only. P/sup 3/T+ has been re-designed and re-implemented from scratch and goes beyond P/sup 3/T by extending the class of programs that cart be handled and by employing several novel estimation methods (symbolic analysis, simulation, pre-measured kernel codes, etc.). The core part of this paper reports on the evaluation of P/sup 3/T+ to demonstrate both accuracy and usefulness of this tool for realistic kernel codes taken from real-world applications (pricing of financial derivatives and quantum mechanical calculations of solids).","PeriodicalId":206541,"journal":{"name":"Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121919150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Improving routing performance in Myrinet networks 改进Myrinet网络的路由性能

Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000 Pub Date : 2000-05-01 DOI: 10.1109/IPDPS.2000.845961

J. Flich, Manuel P. Malumbres, P. López, J. Duato

引用次数: 48

Bandwidth-efficient collective communication for clustered wide area systems 集群广域系统的带宽高效集体通信

Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000 Pub Date : 2000-05-01 DOI: 10.1109/IPDPS.2000.846026

T. Kielmann, H. Bal, S. Gorlatch

{"title":"Bandwidth-efficient collective communication for clustered wide area systems","authors":"T. Kielmann, H. Bal, S. Gorlatch","doi":"10.1109/IPDPS.2000.846026","DOIUrl":"https://doi.org/10.1109/IPDPS.2000.846026","url":null,"abstract":"Metacomputing infrastructures couple multiple clusters (or MPPs) via wide-area networks. A major problem in programming parallel applications for such platforms is their hierarchical network structure: latency and bandwidth of WANs often are orders of magnitude worse than those of local networks. Our goal is to optimize MPI's collective operations for such platforms. In this paper we focus on optimized utilization of the (scarce) wide-area bandwidth. We use two techniques: selecting suitable communication graph shapes, and splitting messages into multiple segments that are sent in parallel over different WAN links. To determine the best graph shape and segment size, we introduce a performance model called parameterized LogP (P-LogP), a hierarchical extension of the LogP model that covers messages of arbitrary length. With P-LogP, the optimal segment size and the best broadcast tree shape can be determined at runtime. (For conciseness, we restrict our discussion to the broadcast operation). An experimental performance evaluation shows that the new broadcast has significantly improved performance (for large messages) and that there is a close match between the theoretical model and the measured completion times.","PeriodicalId":206541,"journal":{"name":"Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127800871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 89

Deterministic replay of distributed Java applications 分布式Java应用程序的确定性重放

Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000 Pub Date : 2000-05-01 DOI: 10.1109/IPDPS.2000.845988

Ravi B. Konuru, H. Srinivasan, Jong-Deok Choi

引用次数: 92

Efficient virtual interface architecture (VIA) support for the IBM SP switch-connected NT clusters 对IBM SP交换机连接的NT集群的高效虚拟接口体系结构(VIA)支持

Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000 Pub Date : 2000-05-01 DOI: 10.1109/IPDPS.2000.845962

M. Banikazemi, V. Moorthy, D. Panda, L. Herger, B. Abali

{"title":"Efficient virtual interface architecture (VIA) support for the IBM SP switch-connected NT clusters","authors":"M. Banikazemi, V. Moorthy, D. Panda, L. Herger, B. Abali","doi":"10.1109/IPDPS.2000.845962","DOIUrl":"https://doi.org/10.1109/IPDPS.2000.845962","url":null,"abstract":"The IBM SP Switch-Connected NT cluster is one the newest clustering platforms available. In this paper, we discuss an experimental implementation of the Virtual Interface Architecture for this platform. We discuss different design issues involved in this implementation. In particular, we explain how the virtual-to-physical address translation can be implemented efficiently with a minimum Network Interface Card (NIC) memory requirement. We show how caching the VIA descriptors on the NIC can reduce the communication latency. We also present an efficient scheme for implementing the VIA door bells without any hardware support. A comprehensive performance evaluation study of the implementation is provided. The performance of the implemented VIA surpasses that of other existing software implementations of the VIA and is comparable to that of a hardware VIA implementation. The peak measured bandwidth for our system is observed to be 101.4 MBytes/s and the one-way latency for short messages is 18.2 microseconds. It is to be noted that the VIA implementation presented in this paper is not a part of any IBM product and no assumptions should be made regarding its availability as a product in the future.","PeriodicalId":206541,"journal":{"name":"Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000","volume":"142 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132075231","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 30