Proceedings. Frontiers '99. Seventh Symposium on the Frontiers of Massively Parallel Computation最新文献

筛选
英文 中文
Packing/unpacking information generation for efficient generalized kr/spl rarr/r and r/spl rarr/kr array redistribution 有效的广义kr/spl rarr/r和r/spl rarr/kr阵列重分配的装箱/解装箱信息生成
Ching-Hsien Hsu, Yeh-Ching Chung, C. Dow
{"title":"Packing/unpacking information generation for efficient generalized kr/spl rarr/r and r/spl rarr/kr array redistribution","authors":"Ching-Hsien Hsu, Yeh-Ching Chung, C. Dow","doi":"10.1109/FMPC.1999.750588","DOIUrl":"https://doi.org/10.1109/FMPC.1999.750588","url":null,"abstract":"Array redistribution is usually required to enhance algorithm performance in many parallel programs on distributed memory multicomputers. Since it is performed at run-time, there is a performance tradeoff between the efficiency of new data decomposition for a subsequent phase of an algorithm and the cost of redistributing data among processors. In this paper, we present efficient methods to generate the packing/unpacking information for BOLCK-CYCLIC(kr) to BLOCK-CYCLIC(r) and BOLCK-CYCLIC(r) to BLOCK-CYCLIC(kr) redistribution with arbitrary source/destination processor sets. The most significant improvement of this paper is that a processor does not need to construct the send/receive data sets for a redistribution. Based on the packing/unpacking information derived from kr/spl rarr/r and r/spl rarr/kr redistributions, a processor can pack/unpack array elements into (from) messages directly. To evaluate the performance of our methods, we have implemented our methods along with the PITFALLS method and the Prylli's method on an IBM SP2 parallel machine. The experimental results show that our algorithms outperform the PITFALLS method and the Prylli's method for all test samples.","PeriodicalId":405655,"journal":{"name":"Proceedings. Frontiers '99. Seventh Symposium on the Frontiers of Massively Parallel Computation","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114968924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient VLSI layouts of hypercubic networks 超立方网络的高效VLSI布局
C. Yeh, Emmanouel Varvarigos, B. Parhami
{"title":"Efficient VLSI layouts of hypercubic networks","authors":"C. Yeh, Emmanouel Varvarigos, B. Parhami","doi":"10.1109/FMPC.1999.750589","DOIUrl":"https://doi.org/10.1109/FMPC.1999.750589","url":null,"abstract":"In this paper we present efficient VLSI layouts of several hypercubic networks. We show that an N-node hypercube and an N-node cube-connected cycles (CCC) graph can be laid out in 4N/sup 2//9+o(N/sup 2/) and 4N/sup 2//(9 log/sub 2//sup 2/N)+o(N/sup 2//log/sup 2/ N) areas, respectively, both of which are optimal within a factor of 1.7~+o(1). We introduce the multilayer grid model, and present efficient layouts of hypercubes that use more than 2 layers of wires. We derive efficient layouts for butterfly networks, generalized hypercubes, hierarchical swapped networks, and indirect swapped networks, that are optimal within a factor of 1+o(1). We also present efficient layouts for folded hypercubes, reduced hypercubes, recursive hierarchical swapped networks, and enhanced-cubes, which are the best results reported for these networks thus far.","PeriodicalId":405655,"journal":{"name":"Proceedings. Frontiers '99. Seventh Symposium on the Frontiers of Massively Parallel Computation","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121159253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Java for numerically intensive computing: from flops to gigaflops 用于数字密集型计算的Java:从flop到gigaflops
S. Midkiff, J. Moreira, M. Snir
{"title":"Java for numerically intensive computing: from flops to gigaflops","authors":"S. Midkiff, J. Moreira, M. Snir","doi":"10.1109/FMPC.1999.750607","DOIUrl":"https://doi.org/10.1109/FMPC.1999.750607","url":null,"abstract":"Java is not thought of as being competitive with Fortran for numerical programming. In this paper, we discuss technologies that can and will deliver Fortran-like performance in Java. These techniques include new and existing compiler technologies, the exploitation of parallelism, and a collection of Java libraries for numerical computing. We also present experimental data to show the effectiveness of our approaches. In particular we achieve 1 Gflops with a linear algebra kernel on an RS/6000 SMP machine. Most of these techniques require no language changes; a few depend on extensions to Java currently under consideration.","PeriodicalId":405655,"journal":{"name":"Proceedings. Frontiers '99. Seventh Symposium on the Frontiers of Massively Parallel Computation","volume":"127 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128941574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
A framework for generating task parallel programs 生成任务并行程序的框架
U. Fissgus, T. Rauber, G. Runger
{"title":"A framework for generating task parallel programs","authors":"U. Fissgus, T. Rauber, G. Runger","doi":"10.1109/FMPC.1999.750586","DOIUrl":"https://doi.org/10.1109/FMPC.1999.750586","url":null,"abstract":"We consider the generation of mixed task and data parallel programs and discuss how a clear separation into a task and data parallel level can support the development of efficient programs. The program development starts with a specification of the maximum degree of task and data parallelism and proceeds by performing several derivation steps in which the degree of parallelism is adapted to a specific parallel machine. We show how the final message-passing programs are generated and how the interaction between the task and data parallel levels can be established. We demonstrate the usefulness of the approach by examples from numerical analysis which offer the potential of a mixed task and data parallel execution but for which it is not a priori clear, how this potential should be used for an implementation on a specific parallel machine.","PeriodicalId":405655,"journal":{"name":"Proceedings. Frontiers '99. Seventh Symposium on the Frontiers of Massively Parallel Computation","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117147946","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A recursive PVM implementation of an image segmentation algorithm with performance results comparing the HIVE and the Cray T3E 一种递归PVM实现的图像分割算法,性能结果比较HIVE和Cray T3E
J. Tilton
{"title":"A recursive PVM implementation of an image segmentation algorithm with performance results comparing the HIVE and the Cray T3E","authors":"J. Tilton","doi":"10.1109/FMPC.1999.750594","DOIUrl":"https://doi.org/10.1109/FMPC.1999.750594","url":null,"abstract":"A recursive PVM (Parallel Virtual Machine) implementation of a high quality but computationally intensive image segmentation approach is described and the performance of the algorithm on the HIVE and on the Cray T3E is contrasted. The image segmentation algorithm, which is designed for the analysis of multispectral or hyperspectral remotely sensed imagery data, is a hybrid of region growing and spectral clustering that produces a hierarchical set of image segmentations based on detected natural convergence points. The HIVE is a Beowulf-class parallel computer consisting of 66 Pentium Pro PCs (64 slaves and 2 controllers) with 2 processors per PC (for 128 total slave processors) which was developed and assembled by the Applied Information Sciences Branch at NASA's Goddard Space Flight Center. The Cray T3E is a supercomputer with 512 available processors, which is installed at the NASA Center for Computational Science at NASA's Goddard Space Flight Center. Timing results on Landsat Multispectral Scanner data show that the algorithm runs approximately 1.5 times faster on the HIVE, even though the HIVE is some 86 times less costly than the Cray T3E.","PeriodicalId":405655,"journal":{"name":"Proceedings. Frontiers '99. Seventh Symposium on the Frontiers of Massively Parallel Computation","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124472412","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
A data-parallel algorithm for iterative tomographic image reconstruction 迭代层析图像重建的数据并行算法
C. Johnson, A. Sofer
{"title":"A data-parallel algorithm for iterative tomographic image reconstruction","authors":"C. Johnson, A. Sofer","doi":"10.1109/FMPC.1999.750592","DOIUrl":"https://doi.org/10.1109/FMPC.1999.750592","url":null,"abstract":"In the tomographic imaging problem images are reconstructed from a set of measured projections. Iterative reconstruction methods are computationally intensive alternatives to the more traditional Fourier-based methods. Despite their high cost, the popularity of these methods is increasing because of the advantages they pose. Although numerous iterative methods have been proposed over the years, all of these methods can be shown to have a similar computational structure. This paper presents a parallel algorithm that we originally developed for performing the expectation maximization algorithm in emission tomography. This algorithm is capable of exploiting the sparsity and symmetries of the model in a computationally efficient manner. Our parallelization scheme is based upon decomposition of the measurement-space vectors. We demonstrate that such a parallelization scheme is applicable to the vast majority of iterative reconstruction algorithms proposed to date.","PeriodicalId":405655,"journal":{"name":"Proceedings. Frontiers '99. Seventh Symposium on the Frontiers of Massively Parallel Computation","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125930368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 39
Implementing MM5 on NASA Goddard Space Flight Center computing systems: a performance study 在美国宇航局戈达德太空飞行中心计算系统上实施 MM5:性能研究
J. Dorband, J. Kouatchou, J. Michalakes, U. Ranawake
{"title":"Implementing MM5 on NASA Goddard Space Flight Center computing systems: a performance study","authors":"J. Dorband, J. Kouatchou, J. Michalakes, U. Ranawake","doi":"10.1109/FMPC.1999.750601","DOIUrl":"https://doi.org/10.1109/FMPC.1999.750601","url":null,"abstract":"We analyze and test the performance of the fifth-generation PSU/NCAR mesoscale model MM5 on parallel computers at NASA Goddard Space Flight Center. We show how MM5 code scales on the Cray J90, the Cray T3E and a cluster of PCs. More precisely, we are interested in finding the elapsed time, load balancing, speedup, number of floating point operations per second, and performance versus cost. Results obtained with two test problems show the efficiency of MM5 on the above computers especially with large size problems.","PeriodicalId":405655,"journal":{"name":"Proceedings. Frontiers '99. Seventh Symposium on the Frontiers of Massively Parallel Computation","volume":"107 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127667478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Optimization of a parallel pseudospectral MHD code 一个并行伪谱MHD代码的优化
A. Dubey, T. Clune
{"title":"Optimization of a parallel pseudospectral MHD code","authors":"A. Dubey, T. Clune","doi":"10.1109/FMPC.1999.750602","DOIUrl":"https://doi.org/10.1109/FMPC.1999.750602","url":null,"abstract":"In this article we outline some techniques for optimizing spectral codes using multidimensional real-to-complex FFT's. We have successfully applied these techniques on a pseudospectral MHD code running on the CRAY T3E. The code uses half precision, and runs up to 2.5 times faster than the version that uses full precision CRAY SCILIB parallel FFT routines. The half precision version without these optimizations is slower does not scale very well, and cannot support more than 128 processors. The optimized code achieved a performance of 100 Gflops on 1024 nodes of a CRAY T3E-600 at NASA Goddard Space Flight Center.","PeriodicalId":405655,"journal":{"name":"Proceedings. Frontiers '99. Seventh Symposium on the Frontiers of Massively Parallel Computation","volume":"45 10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132449469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Token space minimization by simulated annealing 模拟退火法令牌空间最小化
Rafi Lohev, I. Gottlieb
{"title":"Token space minimization by simulated annealing","authors":"Rafi Lohev, I. Gottlieb","doi":"10.1109/FMPC.1999.750604","DOIUrl":"https://doi.org/10.1109/FMPC.1999.750604","url":null,"abstract":"We describe a heuristic solution for the minimum token space scheduling (MTSS) problem, based on simulated annealing. In MTSS, one schedules a set of tasks with precedence constraints, represented by a directed graph. The arcs in the graph represent data, or tokens, which the tasks must receive before they can be processed. MTSS seeks to minimize the maximum number of tokens extant at any time during execution, while minimizing completion time. We motivate MTSS with an application from computer architecture: maximizing the locality of data required for execution of a program by multiprocessors. Simulation results demonstrating the effectiveness of our method are presented.","PeriodicalId":405655,"journal":{"name":"Proceedings. Frontiers '99. Seventh Symposium on the Frontiers of Massively Parallel Computation","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114081511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Superconducting processors for HTMT: issues and challenges HTMT超导处理器:问题与挑战
K. B. Theobald, G. Gao, T. Sterling
{"title":"Superconducting processors for HTMT: issues and challenges","authors":"K. B. Theobald, G. Gao, T. Sterling","doi":"10.1109/FMPC.1999.750608","DOIUrl":"https://doi.org/10.1109/FMPC.1999.750608","url":null,"abstract":"The Hybrid Technology Multi-Threading project is a long-term study of the feasibility of combining several emerging technologies to reach 1 petaFLOPS within ten years. HTMT will combine high-speed superconductor processors, semiconductor memories with built-in processors, high-speed optical interconnects, and high-density holographic storage. While there are major challenges in all aspects of this project, those in processor architecture are the focus of this paper. Fundamental differences between RSFQ circuits and conventional semiconductor circuits, including a radical jump in clock speed, make today's processor design approaches inappropriate for HTMT. Sequential instruction dispatching, even within the lowest programming unit (a strand), will lead to unacceptably high latencies, hence poor performance. We propose alternative processor designs which use fine-grain synchronizations between individual instructions in order to avoid these bottlenecks.","PeriodicalId":405655,"journal":{"name":"Proceedings. Frontiers '99. Seventh Symposium on the Frontiers of Massively Parallel Computation","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126545258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信