Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162)最新文献

A register allocation technique using register existence graph 一种利用寄存器存在图的寄存器分配技术

Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162) Pub Date : 1997-08-11 DOI: 10.1109/ICPP.1997.622673

A. Koseki, Y. Fukazawa, H. Komatsu

引用次数: 0

Efficient parallel algorithms for optimally locating a k-leaf tree in a tree network 树形网络中k叶树最优定位的高效并行算法

Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162) Pub Date : 1997-08-11 DOI: 10.1109/ICPP.1997.622537

S. Ku, W. Shih, Biing-Feng Wang

引用次数: 4

Efficient parallel algorithms on distance-hereditary graphs 距离遗传图的高效并行算法

Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162) Pub Date : 1997-08-11 DOI: 10.1109/ICPP.1997.622541

S. Hsieh, Chin-Wen Ho, T. Hsu, M. Ko, Gen-Huey Chen

引用次数: 17

Turn grouping for efficient barrier synchronization in wormhole mesh networks 虫孔网状网络中有效屏障同步的旋转分组

Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162) Pub Date : 1997-08-11 DOI: 10.1109/ICPP.1997.622588

Kuo-Pao Fan, C. King

引用次数: 3

How much does network contention affect distributed shared memory performance? 网络争用对分布式共享内存性能的影响有多大?

Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162) Pub Date : 1997-08-11 DOI: 10.1109/ICPP.1997.622680

Donglai Dai, D. Panda

{"title":"How much does network contention affect distributed shared memory performance?","authors":"Donglai Dai, D. Panda","doi":"10.1109/ICPP.1997.622680","DOIUrl":"https://doi.org/10.1109/ICPP.1997.622680","url":null,"abstract":"Most of recent research on distributed shared memory (DSM) systems have focused on either careful design of node controllers or cache coherence protocols. While evaluating these designs, simplified models of networks (constant latency or average latency based on the network size) are typically used. Such models completely ignore network contention. To help network designers to design better networks for DSM systems, in this paper; we focus on two goals: 1) to isolate and quantify the impact of network link contention and network interface contention on the overall performance of DSM applications and 2) to study the impact of critical architectural parameters on these two categories of network contention. We achieve these goals by evaluating a set of SPLASH2 benchmarks on a DSM simulator using three network models. For an 8/spl times/8 wormhole system, our results show that network contention can degrade performance up to 59.8%. Out of this, up to 7.2% is caused by network interface contention alone. The study indicates that network contention becomes dominant for DSM systems using small caches, wide cache line sizes, low degrees of associativity, high processing node speeds, high memory speeds, low network speeds, or small network link widths.","PeriodicalId":221761,"journal":{"name":"Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162)","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128252528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 37

Broadcast-efficient sorting in the presence of few channels 在频道较少的情况下进行广播效率排序

Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162) Pub Date : 1997-08-11 DOI: 10.1109/ICPP.1997.622534

K. Nakano, S. Olariu, J. Schwing

引用次数: 5

An Euler path based technique for deadlock-free multicasting 一种基于欧拉路径的无死锁组播技术

Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162) Pub Date : 1997-08-11 DOI: 10.1109/ICPP.1997.622669

N. Agrawal, C. Ravikumar

{"title":"An Euler path based technique for deadlock-free multicasting","authors":"N. Agrawal, C. Ravikumar","doi":"10.1109/ICPP.1997.622669","DOIUrl":"https://doi.org/10.1109/ICPP.1997.622669","url":null,"abstract":"The existing algorithms for deadlock-free multicasting in interconnection networks assume the Hamiltonian property in the networks topology. However, these networks fail to be Hamiltonian in the presence of faults. This paper investigates the use of Euler circuits in deadlock-free multicasting. Not only are Euler circuits known to exist in all connected networks, a fast polynomial-time algorithm exists to find an number circuit in a network. We present a multicasting algorithm which works for both regular and irregular topologies. Our algorithm is applicable to store-and-forward as well as wormhole-routed networks. We show that at most two virtual channel are required per physical channel for any connected network. We also prove that no virtual channels are required to achieve deadlock-free multicasting on a large class of networks. Unlike other existing algorithms for deadlock-free multicasting in faulty networks, our algorithm requires a small amount of information to be stored at each node. The potential of our technique is further illustrated with the help of various examples. A performance analysis on wormhole-routed networks shows that our routing algorithm out-performs existing multicasting procedures.","PeriodicalId":221761,"journal":{"name":"Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162)","volume":"41 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114012852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Improving the performance of out-of-core computations 提高核外计算的性能

Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162) Pub Date : 1997-08-11 DOI: 10.1109/ICPP.1997.622574

M. Kandemir, J. Ramanujam, A. Choudhary

引用次数: 19

Embedding of binomial trees in hypercubes with link faults 链路故障超立方体中二叉树的嵌入

Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162) Pub Date : 1997-08-11 DOI: 10.1109/ICPP.1997.622564

Jie Wu, E. Fernández, Ying-Chen Lo

引用次数: 19

Performance and configuration of hierarchical ring networks for multiprocessors 多处理器分层环形网络的性能和配置

Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162) Pub Date : 1997-08-11 DOI: 10.1109/ICPP.1997.622653

V. Hamacher, Hong Jiang

{"title":"Performance and configuration of hierarchical ring networks for multiprocessors","authors":"V. Hamacher, Hong Jiang","doi":"10.1109/ICPP.1997.622653","DOIUrl":"https://doi.org/10.1109/ICPP.1997.622653","url":null,"abstract":"Analytical queueing network models for expected message delay in 2-level and 3-level hierarchical-ring interconnection networks (INs) are developed. Such networks have recently been used in commercial and research prototype multiprocessors. A major class of traffic carried by these INs consists of cache line transfers, and associated coherency control messages, between processor caches and remote memory modules in shared-memory multiprocessors. Memory modules are assumed to be evenly distributed over the processor nodes. Such traffic consists of short, fixed-length messages. They can be conveniently transported using the slotted ring transmission technique, which is studied here. The message delay results derived from the models are shown to be quite accurate when checked against a simulation study. The comparisons to simulations include heavy traffic situations where queueing delays in ring crossover switches are significant for ring utilization levels of 80 to 90%. As well as facilitating analysis, the analytical models can be used to determine optimal sizes for the rings at different levels in the hierarchy under specified traffic distributions in a system with a given total number of processor nodes. Optimality is in terms of minimizing average message delay. A specific example of such a design exercise is provided for the uniform traffic case.","PeriodicalId":221761,"journal":{"name":"Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128895450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10