Proceedings of SPDP '96: 8th IEEE Symposium on Parallel and Distributed Processing最新文献

筛选
英文 中文
Embedding of k-ary complete trees into hypercubes with optimal load 具有最优负载的k元完全树嵌入超立方体
Jan Trdlicka, P. Tvrdík
{"title":"Embedding of k-ary complete trees into hypercubes with optimal load","authors":"Jan Trdlicka, P. Tvrdík","doi":"10.1109/SPDP.1996.570390","DOIUrl":"https://doi.org/10.1109/SPDP.1996.570390","url":null,"abstract":"The main result of the paper is an algorithm for embedding k-ary complete trees into hypercubes with optimal load and asymptotically optimal dilation. The algorithm is fully scalable, the dimension of the hypercube can be chosen independently of the arity and height of the complete tree. The basic property of the embedded tree is that both all the tree nodes at a given level and all the tree nodes together are uniformly distributed within equally-sized subcubes of the hypercube. This implies that no hypercube node is loaded with more than [A/sub h//2/sup n/] tree nodes and [B/sub h//2/sup n/] leaves of the tree, where A/sub h/ is the number of all tree nodes, B/sub h/ is the number of leaves of the k-ary complete tree of height h, and n is the dimension of the hypercube. The embedding enables optimal emulations of both divide and conquer computations on the k-ary complete tree, where only one level of nodes is active at a time, and general computations based on k-ary complete trees, where all tree nodes are active simultaneously. As a special case the authors obtain an algorithm for embedding the k-ary complete tree of height h into its optimal hypercube with load 1 and with dilation that is only by a small constant factor worse than the lower bound. This improves the best previous result by Shen et al. (1995), whose embedding has load 1 and nearly optimal dilation, but requires much larger than the optimal hypercube.","PeriodicalId":360478,"journal":{"name":"Proceedings of SPDP '96: 8th IEEE Symposium on Parallel and Distributed Processing","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121210858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Extending functional languages with stateful computations 用状态计算扩展函数式语言
Yung-Syau Chen, J. Gaudiot
{"title":"Extending functional languages with stateful computations","authors":"Yung-Syau Chen, J. Gaudiot","doi":"10.1109/SPDP.1996.570381","DOIUrl":"https://doi.org/10.1109/SPDP.1996.570381","url":null,"abstract":"A new approach in which stateful computations can be performed within the framework of a functional programming language is presented. In most functional programming languages, programmers are unable to easily manipulate state-based computations which are not supported by functional languages. To solve this problem, the authors propose to extend the Sisal language with special user declared variables. This approach can greatly help users in writing programs, simplifying parallel compilation, and improving performance. Under this scheme, programmers are able to manipulate stateful computations. In the methodology, programmers are allowed to declare special variables, and the parallel threads can be identified according to the usage of special variables. When compared to \"pure\" functional languages, the extended Sisal has more expressive power due to the availability of stateful computations.","PeriodicalId":360478,"journal":{"name":"Proceedings of SPDP '96: 8th IEEE Symposium on Parallel and Distributed Processing","volume":"103 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124569685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A compiler address transformation for conflict-free access of memories and networks 一个编译器地址转换,用于内存和网络的无冲突访问
M. Al-Mouhamed, L. Bic, Husam Abu-Haimed
{"title":"A compiler address transformation for conflict-free access of memories and networks","authors":"M. Al-Mouhamed, L. Bic, Husam Abu-Haimed","doi":"10.1109/SPDP.1996.570378","DOIUrl":"https://doi.org/10.1109/SPDP.1996.570378","url":null,"abstract":"A method for mapping arrays into parallel memories to minimize serialization and network conflicts for lock-step systems is presented. Each array is associated an arbitrary number of data access patterns that can be identified following compiler data-dependence analysis. Conditions for conflict-free access of parallel memories and network are derived for arbitrary power-of-2 data patterns and arbitrary multistage networks. The authors propose an efficient heuristic to synthesize combined address transformation (NP complete) which applies to arbitrary linear patterns, arbitrary multistage networks, and an arbitrary number of power-of-2 memories. The method can be implemented as part of the address transformation (Xor and And) or through compiler emulation. The performance of optimized storage schemes is presented for FFT, arbitrary sets of data patterns, non power-of-2 stride access in vector processors, interleaving, and static row-column storages. Their approach is profitable in all the above cases and provides a systematic method for converting array-memory mapping and network aspects of algorithms from one network topology to another.","PeriodicalId":360478,"journal":{"name":"Proceedings of SPDP '96: 8th IEEE Symposium on Parallel and Distributed Processing","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116430567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
An empirical study of dynamic scheduling on rings of processors 处理器环上动态调度的实证研究
M. E. Barrows, Dawn E. Gregory, Lixin Gao, A. Rosenberg, P. Cohen
{"title":"An empirical study of dynamic scheduling on rings of processors","authors":"M. E. Barrows, Dawn E. Gregory, Lixin Gao, A. Rosenberg, P. Cohen","doi":"10.1109/SPDP.1996.570370","DOIUrl":"https://doi.org/10.1109/SPDP.1996.570370","url":null,"abstract":"The authors empirically analyze and compare two distributed low-overhead policies for scheduling dynamic tree-structured computations on rings of identical PEs. The experiments show that both policies give significant parallel speedup on large classes of computations, and that one yields almost optimal speedup on moderate size rings. They believe that the methodology of experiment design and analysis will prove useful in other such studies.","PeriodicalId":360478,"journal":{"name":"Proceedings of SPDP '96: 8th IEEE Symposium on Parallel and Distributed Processing","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130565461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Performance of parallel algorithms for a fingerprint image comparison system 指纹图像比较系统的并行算法性能
H. Ammar, Zhouhui Miao
{"title":"Performance of parallel algorithms for a fingerprint image comparison system","authors":"H. Ammar, Zhouhui Miao","doi":"10.1109/SPDP.1996.570362","DOIUrl":"https://doi.org/10.1109/SPDP.1996.570362","url":null,"abstract":"This paper addresses the problem of analyzing the performance of parallel algorithms for the training procedure of a neural network based fingerprint image comparison (FIC) system. The target architecture is assumed to be a coarse-grain distributed memory parallel architecture. Two types of parallelism: node parallelism and training set parallelism (TSP) are investigated. These algorithms are implemented on a 32 node CM-5. Theoretical analysis and experimental results comparing the performance of these algorithms are presented.","PeriodicalId":360478,"journal":{"name":"Proceedings of SPDP '96: 8th IEEE Symposium on Parallel and Distributed Processing","volume":"311 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124423713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An efficient parallel scheduling algorithm 一种高效的并行调度算法
Minyou Wu
{"title":"An efficient parallel scheduling algorithm","authors":"Minyou Wu","doi":"10.1109/SPDP.1996.570342","DOIUrl":"https://doi.org/10.1109/SPDP.1996.570342","url":null,"abstract":"Most static scheduling algorithms that schedule parallel programs represented by directed acyclic graphs (DAGs) are sequential. This paper discusses the essential issues on parallelization of static scheduling algorithms. An efficient parallel scheduling algorithm, the HPMCP algorithm, is proposed. It produces high-quality scheduling and is much faster than existing algorithms.","PeriodicalId":360478,"journal":{"name":"Proceedings of SPDP '96: 8th IEEE Symposium on Parallel and Distributed Processing","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117043057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Impact of load balancing on unstructured adaptive grid computations for distributed-memory multiprocessors 负载平衡对分布式内存多处理器非结构化自适应网格计算的影响
A. Sohn, R. Biswas, H. Simon
{"title":"Impact of load balancing on unstructured adaptive grid computations for distributed-memory multiprocessors","authors":"A. Sohn, R. Biswas, H. Simon","doi":"10.1109/SPDP.1996.570313","DOIUrl":"https://doi.org/10.1109/SPDP.1996.570313","url":null,"abstract":"The computational requirements for an adaptive solution of unsteady problems change as the simulation progresses. This causes workload imbalance among processors on a parallel machine which, in turn, requires significant data movement at runtime. We present a new dynamic load-balancing framework, called JOVE, that balances the workload across all processors with a global view. Whenever the computational mesh is adapted, JOVE is activated to eliminate the load imbalance. JOVE has been implemented on an IBM SP2 distributed-memory machine in MPI for portability. Experimental results for two model meshes demonstrate that mesh adaption with load balancing gives more than a sixfold improvement over one without load balancing. We also show that JOVE gives a 24-fold speedup on 64 processors compared to sequential execution.","PeriodicalId":360478,"journal":{"name":"Proceedings of SPDP '96: 8th IEEE Symposium on Parallel and Distributed Processing","volume":"136 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127834379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
Measurement and simulation based performance analysis of parallel I/O in a high-performance cluster system 基于测量和仿真的高性能集群系统并行I/O性能分析
C. Natarajan, R. Iyer
{"title":"Measurement and simulation based performance analysis of parallel I/O in a high-performance cluster system","authors":"C. Natarajan, R. Iyer","doi":"10.1109/SPDP.1996.570351","DOIUrl":"https://doi.org/10.1109/SPDP.1996.570351","url":null,"abstract":"This paper presents a measurement and simulation based study of parallel I/O in a high-performance cluster system: the Pittsburgh Supercomputing Center (PSC) DEC Alpha Supercluster. The measurements were used to characterize the performance bottlenecks and the throughput limits at the compute and I/O nodes, and to provide realistic input parameters to PioSim, a simulation environment we have developed to investigate parallel I/O performance issues in cluster systems. PioSim was used to obtain a detailed characterization of parallel I/O performance, in the high performance cluster system, for different regular access patterns and different system configurations. This paper also explores the use of local disks at the compute nodes for parallel I/O, and finds that the local disk architecture outperforms the traditional parallel I/O over remote I/O node disks architecture, even when as much as 68-75% of the requests from each compute node goes to remote disks.","PeriodicalId":360478,"journal":{"name":"Proceedings of SPDP '96: 8th IEEE Symposium on Parallel and Distributed Processing","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132984432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信