Proceedings 11th International Parallel Processing Symposium最新文献

筛选
英文 中文
Data access reorganizations in compiling out-of-core data parallel programs on distributed memory machines 分布式内存机上编译核外数据并行程序中的数据访问重组
Proceedings 11th International Parallel Processing Symposium Pub Date : 1997-04-01 DOI: 10.1109/IPPS.1997.580956
M. Kandemir, R. Bordawekar, A. Choudhary
{"title":"Data access reorganizations in compiling out-of-core data parallel programs on distributed memory machines","authors":"M. Kandemir, R. Bordawekar, A. Choudhary","doi":"10.1109/IPPS.1997.580956","DOIUrl":"https://doi.org/10.1109/IPPS.1997.580956","url":null,"abstract":"This paper describes optimization techniques for translating out-of-core programs written in a data parallel language to message passing node programs with explicit parallel I/O. We demonstrate that straightforward extension of in-core compilation techniques does not work well for out-of-core programs. We then describe how the compiler can optimize the code by (1) determining appropriate file layouts for out-of-core arrays, (2) permuting the loops in the nest(s) to allow efficient file access, and (3) partitioning the available node memory among references based on I/O cost estimation. Our experimental results indicate that these optimizations can reduce the amount of time spent in I/O by as much as an order of magnitude.","PeriodicalId":145892,"journal":{"name":"Proceedings 11th International Parallel Processing Symposium","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134123322","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
An efficient technique of instruction scheduling on a superscalar-based multiprocessor 基于超尺度的多处理机指令调度技术
Proceedings 11th International Parallel Processing Symposium Pub Date : 1997-04-01 DOI: 10.1109/IPPS.1997.580840
R. Hwang
{"title":"An efficient technique of instruction scheduling on a superscalar-based multiprocessor","authors":"R. Hwang","doi":"10.1109/IPPS.1997.580840","DOIUrl":"https://doi.org/10.1109/IPPS.1997.580840","url":null,"abstract":"An instruction scheduling approach is proposed for performance enhancement on a superscalar-based multiprocessor. The traditional list scheduling approach is not suitable for the environment because it does not consider the effect of synchronization operation. According to the LED loop theorem, the system performance is very concerned with the position of synchronization operation. Therefore, the scheduling of synchronization operation has the highest priority in this technique. There are two aspects of performance enhancement for the instruction scheduling approach: (1) converting LED into LFD, and (2) reducing the damage of LED. Experimental results show that the enhancement is significant.","PeriodicalId":145892,"journal":{"name":"Proceedings 11th International Parallel Processing Symposium","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134551795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An architecture workbench for multicomputers 用于多计算机的体系结构工作台
Proceedings 11th International Parallel Processing Symposium Pub Date : 1997-04-01 DOI: 10.1109/IPPS.1997.580854
A. Pimentel, L. Hertzberger
{"title":"An architecture workbench for multicomputers","authors":"A. Pimentel, L. Hertzberger","doi":"10.1109/IPPS.1997.580854","DOIUrl":"https://doi.org/10.1109/IPPS.1997.580854","url":null,"abstract":"The large design space of modern computer architectures calls for performance modelling tools to facilitate the evaluation of different alternatives. In this paper we give an overview of the Mermaid multicomputer simulation environment. This environment allows the evaluation of a wide range of architectural design tradeoffs while delivering reasonable simulation performance. To achieve this, simulation takes place at a level of abstract machine instructions rather than at the level of real instructions. Moreover, a less detailed mode of simulation is also provided. So when accuracy is not the primary objective, this simulation mode can yield high simulation efficiency. As a consequence, Mermaid makes both fast prototyping and accurate evaluation of multicomputer architectures feasible.","PeriodicalId":145892,"journal":{"name":"Proceedings 11th International Parallel Processing Symposium","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132518252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
k-ary n-trees: high performance networks for massively parallel architectures K-ary n-树:大规模并行架构的高性能网络
Proceedings 11th International Parallel Processing Symposium Pub Date : 1997-04-01 DOI: 10.1109/IPPS.1997.580853
F. Petrini, M. Vanneschi
{"title":"k-ary n-trees: high performance networks for massively parallel architectures","authors":"F. Petrini, M. Vanneschi","doi":"10.1109/IPPS.1997.580853","DOIUrl":"https://doi.org/10.1109/IPPS.1997.580853","url":null,"abstract":"The past few years have seen a rise in popularity of massively parallel architectures that use fat-trees as their interconnection networks. In this paper we study the communication performance of a parametric family of fat-trees, the k-ary n-trees, built with constant arity switches interconnected in a regular topology. Through simulation on a 4-ary 4-tree with 256 nodes, we analyze some variants of an adaptive algorithm that utilize wormhole routing with one, two and four virtual channels. The experimental results show that the uniform, bit reversal and transpose traffic patterns are very sensitive to the flow control strategy. In all these cases, the saturation points are between 35-40% of the network capacity with one virtual channel, 55-60% with two virtual channels and around 75% with four virtual channels. The complement traffic, a representative of the class of the congestion-free communication patterns, reaches an optimal performance, with a saturation point at 97% of the capacity for all flow control strategies.","PeriodicalId":145892,"journal":{"name":"Proceedings 11th International Parallel Processing Symposium","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131678822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 214
DPF: a data parallel Fortran benchmark suite DPF:一个数据并行Fortran基准测试套件
Proceedings 11th International Parallel Processing Symposium Pub Date : 1997-04-01 DOI: 10.1109/IPPS.1997.580896
Yu Hu, L. Johnsson, D. Kehagias, Nadia Shalaby
{"title":"DPF: a data parallel Fortran benchmark suite","authors":"Yu Hu, L. Johnsson, D. Kehagias, Nadia Shalaby","doi":"10.1109/IPPS.1997.580896","DOIUrl":"https://doi.org/10.1109/IPPS.1997.580896","url":null,"abstract":"We present the Data Parallel Fortran (DPF) benchmark suite, a set of data parallel Fortran codes for evaluating data parallel compilers appropriate for any target parallel architecture, with shared or distributed memory. The codes are provided in basic, optimized and several library versions. The functionality of the benchmarks cover collective communication functions, scientific software library functions, and application kernels that reflect the computational structure and communication patterns in fluid dynamic simulations, fundamental physics and molecular studies in chemistry or biology. The DPF benchmark suite assumes the language model of High Performance Fortran, and provides performance evaluation metrics of busy and elapsed times and FLOP rates, FLOP count, memory usage, communication patterns, focal memory access, and arithmetic efficiency as well as operation and communication counts per iteration. An instance of the benchmark suite was fully implemented in CM-Fortran and tested on the CM-5.","PeriodicalId":145892,"journal":{"name":"Proceedings 11th International Parallel Processing Symposium","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115406479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A formal model of software pipelining loops with conditions 带条件的软件流水线循环的形式化模型
Proceedings 11th International Parallel Processing Symposium Pub Date : 1997-04-01 DOI: 10.1109/IPPS.1997.580955
D. Milicev, Z. Jovanovic
{"title":"A formal model of software pipelining loops with conditions","authors":"D. Milicev, Z. Jovanovic","doi":"10.1109/IPPS.1997.580955","DOIUrl":"https://doi.org/10.1109/IPPS.1997.580955","url":null,"abstract":"This paper addresses the problem of parallelizing loops with conditional branches in the context of software pipelining. A new formal approach to this problem is proposed in the form of Predicated Software Pipelining (PSP) model. The PSP model represents execution of a loop with conditional branches as transitions of a finite state machine. Each node of the state machine is composed of operations of one parallelized loop iteration. The rules for operation movements between nodes in the PSP model are described. The model represents a new theoretical framework for further investigation of inherent properties of these loops.","PeriodicalId":145892,"journal":{"name":"Proceedings 11th International Parallel Processing Symposium","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116712972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
A parallel algorithm for weighted distance transforms 加权距离变换的并行算法
Proceedings 11th International Parallel Processing Symposium Pub Date : 1997-04-01 DOI: 10.1109/IPPS.1997.580934
A. Fujiwara, M. Inoue, T. Masuzawa, H. Fujiwara
{"title":"A parallel algorithm for weighted distance transforms","authors":"A. Fujiwara, M. Inoue, T. Masuzawa, H. Fujiwara","doi":"10.1109/IPPS.1997.580934","DOIUrl":"https://doi.org/10.1109/IPPS.1997.580934","url":null,"abstract":"This paper presents a parallel algorithm for the weighted distance transform and the nearest feature transform of an n/spl times/n binary image. We show that the algorithm runs in O(log n) time using n/sup 2//log n processors on the EREW PRAM and in O(log log n) time using n/sup 2//log log n processors on the common CRCW PRAM. We also show that the algorithm runs in O(n/sup 2//p/sup 2/+n) time an a p/spl times/p mesh and in O (n/sup 2//p/sup 2/+(n log p)/p) time on a p/sup 2/ processor hypercube (for 1/spl les/p/spl les/n). The algorithm is cost optimal on the PRAMs, on the mesh (for 1/spl les/p/spl les//spl radic/n) and on the hypercube (for 1/spl les/p/spl les/n/log n). We show that the time complexity on the EREW PRAM is time optimal.","PeriodicalId":145892,"journal":{"name":"Proceedings 11th International Parallel Processing Symposium","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114146320","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A 2-D parallel convex hull algorithm with optimal communication phases 一种具有最优通信相位的二维并行凸包算法
Proceedings 11th International Parallel Processing Symposium Pub Date : 1997-04-01 DOI: 10.1109/IPPS.1997.580962
Jieliang Zhou, Xiaotie Deng, Patrick W. Dymond
{"title":"A 2-D parallel convex hull algorithm with optimal communication phases","authors":"Jieliang Zhou, Xiaotie Deng, Patrick W. Dymond","doi":"10.1109/IPPS.1997.580962","DOIUrl":"https://doi.org/10.1109/IPPS.1997.580962","url":null,"abstract":"We investigate the problem of finding the two-dimensional convex hull of a set of points on a coarse-grained parallel computer. Recently Goodrich has devised a parallel sorting algorithm for n items on P processors which achieves an optimal number of communication phases for all ranges of P/spl les/n. Ferreira et al. have recently introduced a deterministic convex hull algorithm with a constant number of communication phases for n and P satisfying n/spl ges/P/sup 1+/spl epsiv//. Here we obtain a new parallel 2-D convex hull algorithm with an optimal bound on number of communication phases for all values of P/spl les/n while maintaining optimal local computation time.","PeriodicalId":145892,"journal":{"name":"Proceedings 11th International Parallel Processing Symposium","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127010019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Performance analysis and optimization on a parallel atmospheric general circulation model code 并行大气环流模式代码的性能分析与优化
Proceedings 11th International Parallel Processing Symposium Pub Date : 1997-04-01 DOI: 10.1109/IPPS.1997.580879
J. Lou, J. Farrara
{"title":"Performance analysis and optimization on a parallel atmospheric general circulation model code","authors":"J. Lou, J. Farrara","doi":"10.1109/IPPS.1997.580879","DOIUrl":"https://doi.org/10.1109/IPPS.1997.580879","url":null,"abstract":"An analysis is presented of the primary factors influencing the performance of a parallel implementation of the UCLA atmospheric general circulation model (AGCM) on distributed memory, massively parallel computer systems. Several modifications to the original parallel AGCM code aimed at improving its numerical efficiency, load balance and single node code performance are discussed. The impact of these optimization strategies on the performance on two of the state of the art parallel computers, the Intel Paragon and Cray T3D, is presented and analyzed. It is found that implementation of a load balanced FFT algorithm results in a reduction in overall execution time of approximately 45% compared to the original convolution based algorithm. Preliminary results of the application of a load balancing scheme for the physics part of the AGCM code suggest additional reductions in execution time of 15-20% can be achieved. Finally, several strategies for improving the single node performance of the code are presented, and the results obtained thus far suggest reductions in execution time in the range of 30-40% are possible.","PeriodicalId":145892,"journal":{"name":"Proceedings 11th International Parallel Processing Symposium","volume":"-1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123128243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Architecture-dependent tuning of the parameterized communication model for optimal multicasting 基于体系结构的参数化通信模型调优,以实现最佳组播
Proceedings 11th International Parallel Processing Symposium Pub Date : 1997-04-01 DOI: 10.1109/IPPS.1997.580959
N. Nupairoj, L. Ni, J. L. Park, Hyeong-Ah Choi
{"title":"Architecture-dependent tuning of the parameterized communication model for optimal multicasting","authors":"N. Nupairoj, L. Ni, J. L. Park, Hyeong-Ah Choi","doi":"10.1109/IPPS.1997.580959","DOIUrl":"https://doi.org/10.1109/IPPS.1997.580959","url":null,"abstract":"A key issue in designing software multicast algorithms is to consider the trade-off between performance and portability. Portable software multicast algorithms based on generic communication models cannot capture some architecture-specific features. Without considering the underlying network architecture, these multicast algorithms may not achieve the truly optimal performance when implemented in real networks. The objective of this research is to investigate architecture-dependent tuning on performance of multicast algorithms developed based on architecture-independent models. Specifically, we intend to optimize the multicast algorithm based on the parameterized communication model. We propose two multicast algorithms, OPT-mesh and OPT-min which are the optimized versions of the parameterized multicast algorithm for wormhole-switched mesh networks and BMIN networks, respectively. Using our flit-level simulator the performance of both algorithms are compared with the architecture-independent version of the parameterized multicast algorithm and two other well-known network-dependent algorithms based on the binomial tree.","PeriodicalId":145892,"journal":{"name":"Proceedings 11th International Parallel Processing Symposium","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133341432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信