Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162)最新文献

筛选
英文 中文
Optimal sorting algorithms on incomplete meshes with arbitrary fault patterns 具有任意故障模式的不完全网格的最优排序算法
C. Yeh, B. Parhami
{"title":"Optimal sorting algorithms on incomplete meshes with arbitrary fault patterns","authors":"C. Yeh, B. Parhami","doi":"10.1109/ICPP.1997.622530","DOIUrl":"https://doi.org/10.1109/ICPP.1997.622530","url":null,"abstract":"In this paper we propose simple and efficient algorithms for sorting on incomplete meshes. No hardware redundancy is required and no assumption is made about the availability of a complete submesh. The proposed robust sorting algorithms are very efficient when only a few processors are faulty and degrade gracefully as the number of faults increases. In particular we show that 1-1 sorting (1 key per healthy processor) in row-major or snakelike row-major order can be performed in 3n+o(n) communication and comparison steps on an n/spl times/n incomplete mesh that has an arbitrary pattern of o(/spl radic/n) faulty processors. This is the fastest algorithm reported thus far for sorting in row-major and snakelike row-major orders on faulty meshes and the time complexity is quite close to its lower bound.","PeriodicalId":221761,"journal":{"name":"Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115642520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Compiler techniques for effective communication on distributed-memory multiprocessors 分布式内存多处理器上有效通信的编译器技术
A. Navarro, E. Zapata, Y. Paek, D. Padua
{"title":"Compiler techniques for effective communication on distributed-memory multiprocessors","authors":"A. Navarro, E. Zapata, Y. Paek, D. Padua","doi":"10.1109/ICPP.1997.622559","DOIUrl":"https://doi.org/10.1109/ICPP.1997.622559","url":null,"abstract":"The Polaris restructurer transforms conventional Fortran programs into parallel form for various types of multiprocessor systems. This paper presents the results of a study on strategies to improve the effectiveness of Polaris' techniques for distributed-memory multiprocessors. Our study, which is based on the hand analysis of MDG and TRFD from the Perfect Benchmarks and TOYCATV and SWIM from SPEC benchmarks, identified three techniques that are important for improving communication optimization. Their application produces almost perfect speedups for the four programs on the Cray T3D.","PeriodicalId":221761,"journal":{"name":"Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162)","volume":"9 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125859146","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Performance analysis and simulation of multicast networks 组播网络的性能分析与仿真
Yuanyuan Yang, Jianchao Wang
{"title":"Performance analysis and simulation of multicast networks","authors":"Yuanyuan Yang, Jianchao Wang","doi":"10.1109/ICPP.1997.622670","DOIUrl":"https://doi.org/10.1109/ICPP.1997.622670","url":null,"abstract":"In this paper, we look into the issue of supporting multicast in the well-known three-stage Clos network or /spl nu/(m, n, r) network. We first develop an analytical model for the blocking probability of the /spl nu/(m, n, r) multicast network, and then study the blocking behavior of the network under various routing control strategies through simulations. Our analytical and simulation results show that a /spl nu/(m, n, r) network with a small number of middle switches m, such as m=n+c or dn, where c and d are small constants, is almost nonblocking for multicast connections, although theoretically it requires m/spl ges//spl Theta/ (nlog r/log log r) to achieve nonblocking for multicast connections. We also demonstrate that routing control strategies are effective for reducing the blocking probability of the multicast network. The best routing control strategy can provide a factor of 2 to 3 performance improvement over random routing. The results indicate that a /spl nu/(m, n, r) network with a comparable cost to a permutation network can provide cost-effective support for multicast communication.","PeriodicalId":221761,"journal":{"name":"Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162)","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129834560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Stride-directed prefetching for secondary caches 二级缓存的定向步进预取
Sunil Kim, A. Veidenbaum
{"title":"Stride-directed prefetching for secondary caches","authors":"Sunil Kim, A. Veidenbaum","doi":"10.1109/ICPP.1997.622661","DOIUrl":"https://doi.org/10.1109/ICPP.1997.622661","url":null,"abstract":"This paper studies hardware prefetching for second-level (L2) caches. Previous work on prefetching has been extensive but largely directed at primary caches. In some cases only L2 prefetching is possible or is more appropriate. By studying L2 prefetching characteristics we show that existing stride-directed methods for L1 caches do not work as well in L2 caches. We propose a new stride-detection mechanism for L2 prefetching and combine it with stream buffers used in Palacharla and Kessler, (1994). Our evaluation shows that this new prefetching scheme is more effective than stream buffer prefetching particularly for applications with long-stride accesses. Finally, we evaluate an L2 cache prefetching organization which combines a small L2 cache with our stride-directed prefetching scheme. Our results show that this system performs significantly better than stream buffer prefetching or a larger non-prefetching L2 cache without suffering from a significant increase in the memory traffic.","PeriodicalId":221761,"journal":{"name":"Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126847721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
Efficient multicast algorithms in all-port wormhole-routed hypercubes 全端口虫洞路由超立方体中的高效组播算法
Vivek Halwan, F. Özgüner
{"title":"Efficient multicast algorithms in all-port wormhole-routed hypercubes","authors":"Vivek Halwan, F. Özgüner","doi":"10.1109/ICPP.1997.622562","DOIUrl":"https://doi.org/10.1109/ICPP.1997.622562","url":null,"abstract":"This paper presents several recursive heuristic methods for multicasting in all-port dimension-ordered wormhole-routed hypercubes. The methods described are stepwise contention-free and are primarily designed to reduce the number of communication steps. Experiments show that the number of steps can be significantly reduced compared to depth contention-free solutions previously described. These methods are also shown to be source-controlled depth contention-free and can be considered a generalization of the broadcast method described by C.T. Ho and M. Kao (1995) which is the most efficient method known.","PeriodicalId":221761,"journal":{"name":"Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124527996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
An improved analytical model for wormhole routed networks with application to butterfly fat-trees 虫洞路由网络的改进分析模型及其在蝴蝶树上的应用
R. I. Greenberg, L. Guan
{"title":"An improved analytical model for wormhole routed networks with application to butterfly fat-trees","authors":"R. I. Greenberg, L. Guan","doi":"10.1109/ICPP.1997.622554","DOIUrl":"https://doi.org/10.1109/ICPP.1997.622554","url":null,"abstract":"A performance model for wormhole routed interconnection networks is presented and applied to the butterfly fat-tree network. Experimental results agree very closely over a wide range of load rate. Novel aspects of the model, leading to accurate and simple performance predictions, include: (1) use of multiple-server queues, and (2) a general method of correcting queuing results based on Poisson arrivals to apply to wormhole routing. These ideas can also be applied to other networks.","PeriodicalId":221761,"journal":{"name":"Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127788825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
Background compensation and an active-camera motion tracking algorithm 背景补偿和有源摄像机运动跟踪算法
R. Gupta, M. Theys, H. Siegel
{"title":"Background compensation and an active-camera motion tracking algorithm","authors":"R. Gupta, M. Theys, H. Siegel","doi":"10.1109/ICPP.1997.622677","DOIUrl":"https://doi.org/10.1109/ICPP.1997.622677","url":null,"abstract":"Motion tracking using an active camera is a very computationally complex problem. Existing serial algorithms have provided frame rates that are much lower than those desired, mainly because of the lack of computational resources. Parallel computers are well suited to image processing tasks and can provide the computational power that is required for real-time motion tracking algorithms. This paper develops a parallel implementation of a known serial motion tracking algorithm, with the goal of achieving greater than real-time frame rates, and to study the effects of data layout, choice of parallel mode of execution, and machine size on the execution time of this algorithm. A distinguishing feature of this application study is that the portion of each image frame that is relevant changes from one frame to the next based on the camera motion. This impacts the effect of the chosen data layout on the needed inter-processor data transfers and the way in which work is distributed among the processors. Experiments were performed to determine for which image sizes and number of processors which data layout would perform better. The parallel computers used in this study are the MasPar MP-1, Intel Paragon, and PASM. Different modes are examined and it is determined that mixed mode is faster than SIMD or MIMD implementations.","PeriodicalId":221761,"journal":{"name":"Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162)","volume":"366 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132984445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Hindsight helps: deterministic task scheduling with backtracking 后见之明有帮助:带回溯的确定性任务调度
Yueh-O Wang, N. Amato, D. Friesen
{"title":"Hindsight helps: deterministic task scheduling with backtracking","authors":"Yueh-O Wang, N. Amato, D. Friesen","doi":"10.1109/ICPP.1997.622582","DOIUrl":"https://doi.org/10.1109/ICPP.1997.622582","url":null,"abstract":"This paper considers the problem of scheduling a set of precedence-related tasks on a nonpreemptive homogeneous message-passing multiprocessors system in order to minimize the makespan, that is, the completion time of the last task relative to start time of the first task. We propose family of scheduling algorithms, called IPR for immediate predecessor rescheduling, which utilize one level of backtracking. We also develop a unifying framework to facilitate the comparison between our results and the various models and algorithms that have been previously studied. We show, both theoretically and experimentally, that the IPR algorithms out-perform previous algorithms in terms of both time complexity and the makespans of the resulting schedules. Moreover our simulation results indicate that the relative advantage of the IPR algorithms increases as the communication constraint is relaxed.","PeriodicalId":221761,"journal":{"name":"Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130856430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Adaptive load-balancing algorithms using symmetric broadcast networks: performance study on an IBM SP2 使用对称广播网络的自适应负载平衡算法:在IBM SP2上的性能研究
Sajal K. Das, Daniel J. Harvey, R. Biswas
{"title":"Adaptive load-balancing algorithms using symmetric broadcast networks: performance study on an IBM SP2","authors":"Sajal K. Das, Daniel J. Harvey, R. Biswas","doi":"10.1109/ICPP.1997.622667","DOIUrl":"https://doi.org/10.1109/ICPP.1997.622667","url":null,"abstract":"In a distributed-computing environment, it is important to ensure that the processor work loads are adequately balanced. Among numerous load-balancing algorithms, a unique approach due to Das and Prasad defines a symmetric broadcast network (SBN) that provides a robust communication pattern among the processors in a topology-independent manner. In this paper, we propose and analyze three SBN-based load-balancing algorithms, and implement them on an SP2. A thorough experimental study with Poisson-distributed synthetic loads demonstrates that these algorithms are very effective in balancing system load while minimizing processor idle time. They also compare favorably with several existing techniques.","PeriodicalId":221761,"journal":{"name":"Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162)","volume":"205 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123257805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Network performance under physical constraints 物理约束下的网络性能
F. Petrini, M. Vanneschi
{"title":"Network performance under physical constraints","authors":"F. Petrini, M. Vanneschi","doi":"10.1109/ICPP.1997.622550","DOIUrl":"https://doi.org/10.1109/ICPP.1997.622550","url":null,"abstract":"The performance of an interconnection network in a massively parallel architecture is subject to physical constraints whose impact needs to be re-evaluated from time to time. Fat-trees, and low dimensional cubes have raised a great interest in the scientific community in the last few years and are emerging standards in the design of interconnection networks for massively parallel computers. In this paper we compare the communication performance of these two classes of interconnection networks using a detailed simulation model. The comparison is made using a set of synthetic benchmarks, taking into account physical constraints, as pin and bandwidth limitations, and the router complexity. In our experiments we consider two networks with 256 nodes, a 16-ary 2-cube and 4-ary 4-tree.","PeriodicalId":221761,"journal":{"name":"Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162)","volume":"25 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116255756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信