Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162)最新文献

筛选
英文 中文
A register allocation technique using register existence graph 一种利用寄存器存在图的寄存器分配技术
A. Koseki, Y. Fukazawa, H. Komatsu
{"title":"A register allocation technique using register existence graph","authors":"A. Koseki, Y. Fukazawa, H. Komatsu","doi":"10.1109/ICPP.1997.622673","DOIUrl":"https://doi.org/10.1109/ICPP.1997.622673","url":null,"abstract":"Optimizing compilation is very important for generating code sequences in order to utilize the characteristics of processor architectures. One of the most essential optimization techniques is register allocation. In register allocation that takes account of instruction-level parallelism, anti-dependences generated when the same register is allocated to different variables, and spill code generated when the number of registers is insufficient should be handled in such a way that the parallelism in a program is not lost. In our method, we realized register allocation using a new data structure called the register existence graph, in which the parallelism in program is well expressed.","PeriodicalId":221761,"journal":{"name":"Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125029525","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient parallel algorithms for optimally locating a k-leaf tree in a tree network 树形网络中k叶树最优定位的高效并行算法
S. Ku, W. Shih, Biing-Feng Wang
{"title":"Efficient parallel algorithms for optimally locating a k-leaf tree in a tree network","authors":"S. Ku, W. Shih, Biing-Feng Wang","doi":"10.1109/ICPP.1997.622537","DOIUrl":"https://doi.org/10.1109/ICPP.1997.622537","url":null,"abstract":"In this paper, an efficient parallel algorithm is proposed for finding a k-tree core of a tree network. The proposed algorithm performs on the EREW PRAM in O(log n log* n) time using O(n) work.","PeriodicalId":221761,"journal":{"name":"Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123551525","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Efficient parallel algorithms on distance-hereditary graphs 距离遗传图的高效并行算法
S. Hsieh, Chin-Wen Ho, T. Hsu, M. Ko, Gen-Huey Chen
{"title":"Efficient parallel algorithms on distance-hereditary graphs","authors":"S. Hsieh, Chin-Wen Ho, T. Hsu, M. Ko, Gen-Huey Chen","doi":"10.1109/ICPP.1997.622541","DOIUrl":"https://doi.org/10.1109/ICPP.1997.622541","url":null,"abstract":"We present efficient parallel algorithms for finding a minimum weighted connected dominating set, a minimum weighted Steiner tree for a distance-hereditary graph which take O(log n) time using O(n+m) processors on a CRCW PRAM, where n and m are the number of vertices and edges of a given graph, respectively. We also find a maximum weighted clique of a distance-hereditary graph in O(log/sup 2/ n) time using O(n+m) processors on a CREW PRAM.","PeriodicalId":221761,"journal":{"name":"Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114601377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Turn grouping for efficient barrier synchronization in wormhole mesh networks 虫孔网状网络中有效屏障同步的旋转分组
Kuo-Pao Fan, C. King
{"title":"Turn grouping for efficient barrier synchronization in wormhole mesh networks","authors":"Kuo-Pao Fan, C. King","doi":"10.1109/ICPP.1997.622588","DOIUrl":"https://doi.org/10.1109/ICPP.1997.622588","url":null,"abstract":"Barrier is an important synchronization operation. On scalable parallel computers, it is often implemented as a collective communication with a reduction operation followed by a distribution operation. In this paper, we introduce a systematic way of generating efficient algorithms to perform barrier synchronization in mesh networks. The scheme works with any base routing algorithm derivable from the turn model. Our scheme extends the turn grouping method with two new algorithms, Tail to Central and Central to Tail, for scheduling the message transmission in the reduction and distribution phase respectively. Simulation results show that our approach can take advantage of the adaptivity of the turn-model based routing algorithms and outperform methods proposed previously.","PeriodicalId":221761,"journal":{"name":"Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128412912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
How much does network contention affect distributed shared memory performance? 网络争用对分布式共享内存性能的影响有多大?
Donglai Dai, D. Panda
{"title":"How much does network contention affect distributed shared memory performance?","authors":"Donglai Dai, D. Panda","doi":"10.1109/ICPP.1997.622680","DOIUrl":"https://doi.org/10.1109/ICPP.1997.622680","url":null,"abstract":"Most of recent research on distributed shared memory (DSM) systems have focused on either careful design of node controllers or cache coherence protocols. While evaluating these designs, simplified models of networks (constant latency or average latency based on the network size) are typically used. Such models completely ignore network contention. To help network designers to design better networks for DSM systems, in this paper; we focus on two goals: 1) to isolate and quantify the impact of network link contention and network interface contention on the overall performance of DSM applications and 2) to study the impact of critical architectural parameters on these two categories of network contention. We achieve these goals by evaluating a set of SPLASH2 benchmarks on a DSM simulator using three network models. For an 8/spl times/8 wormhole system, our results show that network contention can degrade performance up to 59.8%. Out of this, up to 7.2% is caused by network interface contention alone. The study indicates that network contention becomes dominant for DSM systems using small caches, wide cache line sizes, low degrees of associativity, high processing node speeds, high memory speeds, low network speeds, or small network link widths.","PeriodicalId":221761,"journal":{"name":"Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162)","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128252528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 37
Broadcast-efficient sorting in the presence of few channels 在频道较少的情况下进行广播效率排序
K. Nakano, S. Olariu, J. Schwing
{"title":"Broadcast-efficient sorting in the presence of few channels","authors":"K. Nakano, S. Olariu, J. Schwing","doi":"10.1109/ICPP.1997.622534","DOIUrl":"https://doi.org/10.1109/ICPP.1997.622534","url":null,"abstract":"We present simple and broadcast-efficient ranking and sorting algorithms on the broadcast communication model (BCM, for short) with few communication channels. At the heart of our algorithms is a new and elegant sampling and bucketing scheme whose main feature is that the resulting buckets are well balanced, making costly rebalancing unnecessary. The resulting ranking algorithm uses only 2 n/k+o(n/k) broadcast rounds, while 3 n/k+o(n/k) broadcast rounds are needed for sorting on a L-channel, n-processor BCM whenever k/spl les//spl radic/(n/log n). These bounds are fairly tight, when compared with the trivial lower bound of n/k broadcast rounds necessary to permute n items using k communication channels.","PeriodicalId":221761,"journal":{"name":"Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129281105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
An Euler path based technique for deadlock-free multicasting 一种基于欧拉路径的无死锁组播技术
N. Agrawal, C. Ravikumar
{"title":"An Euler path based technique for deadlock-free multicasting","authors":"N. Agrawal, C. Ravikumar","doi":"10.1109/ICPP.1997.622669","DOIUrl":"https://doi.org/10.1109/ICPP.1997.622669","url":null,"abstract":"The existing algorithms for deadlock-free multicasting in interconnection networks assume the Hamiltonian property in the networks topology. However, these networks fail to be Hamiltonian in the presence of faults. This paper investigates the use of Euler circuits in deadlock-free multicasting. Not only are Euler circuits known to exist in all connected networks, a fast polynomial-time algorithm exists to find an number circuit in a network. We present a multicasting algorithm which works for both regular and irregular topologies. Our algorithm is applicable to store-and-forward as well as wormhole-routed networks. We show that at most two virtual channel are required per physical channel for any connected network. We also prove that no virtual channels are required to achieve deadlock-free multicasting on a large class of networks. Unlike other existing algorithms for deadlock-free multicasting in faulty networks, our algorithm requires a small amount of information to be stored at each node. The potential of our technique is further illustrated with the help of various examples. A performance analysis on wormhole-routed networks shows that our routing algorithm out-performs existing multicasting procedures.","PeriodicalId":221761,"journal":{"name":"Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162)","volume":"41 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114012852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Improving the performance of out-of-core computations 提高核外计算的性能
M. Kandemir, J. Ramanujam, A. Choudhary
{"title":"Improving the performance of out-of-core computations","authors":"M. Kandemir, J. Ramanujam, A. Choudhary","doi":"10.1109/ICPP.1997.622574","DOIUrl":"https://doi.org/10.1109/ICPP.1997.622574","url":null,"abstract":"The difficulty of handling out-of-core data limits the potential of parallel machines and high-end supercomputers. Since writing an efficient out-of-core version of a program is a difficult task and since virtual memory systems do not perform well on scientific computations, we believe that there is a clear need for compiler-directed explicit I/O approach for out-of-core computations. In this paper, we present a compiler algorithm to optimize locality of disk accesses in out-of-core codes by choosing a good combination of file layouts on disks and loop transformations. The transformations change the access order of array data. Experimental results obtained on IBM SP-2 and Intel Paragon provide encouraging evidence that our approach is successful at optimizing programs which depend on disk-resident data in distributed-memory machines.","PeriodicalId":221761,"journal":{"name":"Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162)","volume":"91 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122033391","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Embedding of binomial trees in hypercubes with link faults 链路故障超立方体中二叉树的嵌入
Jie Wu, E. Fernández, Ying-Chen Lo
{"title":"Embedding of binomial trees in hypercubes with link faults","authors":"Jie Wu, E. Fernández, Ying-Chen Lo","doi":"10.1109/ICPP.1997.622564","DOIUrl":"https://doi.org/10.1109/ICPP.1997.622564","url":null,"abstract":"We study the embedding of binomial trees with variable roots in n-dimensional hypercubes (n-cubes) with faulty links. A simple embedding algorithm is first proposed that can embed an n-level binomial tree in an n-cube with up to n-1 faulty links in log(n-1) steps. We then extend the result to show that spanning binomial trees exist in a connected n-cube with up to [3(n-1)/2]-1 faulty links. Our results reveal the fault tolerance property of hypercubes and they can be used to predict the performance of broadcasting and reduction operations, where the binomial tree structure is commonly used.","PeriodicalId":221761,"journal":{"name":"Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127236969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Performance and configuration of hierarchical ring networks for multiprocessors 多处理器分层环形网络的性能和配置
V. Hamacher, Hong Jiang
{"title":"Performance and configuration of hierarchical ring networks for multiprocessors","authors":"V. Hamacher, Hong Jiang","doi":"10.1109/ICPP.1997.622653","DOIUrl":"https://doi.org/10.1109/ICPP.1997.622653","url":null,"abstract":"Analytical queueing network models for expected message delay in 2-level and 3-level hierarchical-ring interconnection networks (INs) are developed. Such networks have recently been used in commercial and research prototype multiprocessors. A major class of traffic carried by these INs consists of cache line transfers, and associated coherency control messages, between processor caches and remote memory modules in shared-memory multiprocessors. Memory modules are assumed to be evenly distributed over the processor nodes. Such traffic consists of short, fixed-length messages. They can be conveniently transported using the slotted ring transmission technique, which is studied here. The message delay results derived from the models are shown to be quite accurate when checked against a simulation study. The comparisons to simulations include heavy traffic situations where queueing delays in ring crossover switches are significant for ring utilization levels of 80 to 90%. As well as facilitating analysis, the analytical models can be used to determine optimal sizes for the rings at different levels in the hierarchy under specified traffic distributions in a system with a given total number of processor nodes. Optimality is in terms of minimizing average message delay. A specific example of such a design exercise is provided for the uniform traffic case.","PeriodicalId":221761,"journal":{"name":"Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128895450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信