Proceedings. Fifth International Conference on High Performance Computing (Cat. No. 98EX238)最新文献_第3页

Improving error bounds for multipole-based treecodes 改进基于多极的树码的错误边界

Proceedings. Fifth International Conference on High Performance Computing (Cat. No. 98EX238) Pub Date : 1998-12-17 DOI: 10.1109/HIPC.1998.737973

A. Grama, V. Sarin, A. Sameh

{"title":"Improving error bounds for multipole-based treecodes","authors":"A. Grama, V. Sarin, A. Sameh","doi":"10.1109/HIPC.1998.737973","DOIUrl":"https://doi.org/10.1109/HIPC.1998.737973","url":null,"abstract":"Rapid evaluation of potentials in particle systems is an important and time-consuming step in many physical simulations. Over the past decade (1988-98), the development of treecodes such as the Fast Multipole Method (FMM) and the Barnes-Hut method has enabled large scale simulations in domains such as astrophysics, molecular dynamics, and material science. FMM and related methods rely on fixed degree polynomial (p) approximations of the potential of a set of points in a hierarchy. We present a sequence of results to illustrate that keeping the multipole degree constant can lead to large aggregate errors. An alternate strategy based on a careful selection of the multipole degree leads to asymptotically lower errors; while incurring minimal computation overhead for practical problem sizes. The paper presents theoretical results for computing the degree of a particle cluster interaction, the error associated with the interaction, the error associated with a particle for all of its interactions, and the computational complexity of the new method. These results show that it is possible to reduce the simulation error asymptotically while incurring minimal computational overhead. The paper also presents experimental validation of these results on a 32 processor Origin 2000 in the context of problems ranging from astrophysics to boundary element solvers. In addition to verifying theoretical results, we also show that it is possible to achieve excellent parallel speedup for the treecode.","PeriodicalId":175528,"journal":{"name":"Proceedings. Fifth International Conference on High Performance Computing (Cat. No. 98EX238)","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131276506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

Implementing a parallel list on the SB-PRAM 在SB-PRAM上实现并行列表

Proceedings. Fifth International Conference on High Performance Computing (Cat. No. 98EX238) Pub Date : 1998-12-17 DOI: 10.1109/HIPC.1998.737970

A. Paul, J. Röhrig

引用次数: 1

An efficient implementation of a progressive image transmission system using successive pruning algorithm on a parallel architecture 在并行结构上采用连续剪枝算法的渐进图像传输系统的有效实现

Proceedings. Fifth International Conference on High Performance Computing (Cat. No. 98EX238) Pub Date : 1998-12-17 DOI: 10.1109/HIPC.1998.738020

S. Venkatesh, S. Srinivasan, Ray Chen

引用次数: 3

Hierarchical architecture for parallel query processing on networks of workstations 面向工作站网络并行查询处理的层次结构

Proceedings. Fifth International Conference on High Performance Computing (Cat. No. 98EX238) Pub Date : 1998-12-17 DOI: 10.1109/HIPC.1998.738008

Boquan Xie, S. Dandamudi

引用次数: 5

Multiple token distributed loop local area networks: analysis 多令牌分布式环路局域网:分析

Proceedings. Fifth International Conference on High Performance Computing (Cat. No. 98EX238) Pub Date : 1998-12-17 DOI: 10.1109/HIPC.1998.738014

N. Chalamaiah, R. Badrinath

引用次数: 3

Memory bank disambiguation using modulo unrolling for Raw machines 基于模展开的原始机器内存库消歧

Proceedings. Fifth International Conference on High Performance Computing (Cat. No. 98EX238) Pub Date : 1998-12-17 DOI: 10.1109/HIPC.1998.737991

R. Barua, Walter Lee, Saman P. Amarasinghe, A. Agarwal

{"title":"Memory bank disambiguation using modulo unrolling for Raw machines","authors":"R. Barua, Walter Lee, Saman P. Amarasinghe, A. Agarwal","doi":"10.1109/HIPC.1998.737991","DOIUrl":"https://doi.org/10.1109/HIPC.1998.737991","url":null,"abstract":"We present modulo unrolling, a code transformation technique for enabling array references to be accessed through the fast static network on a Raw machine. A Raw machine comprises of a mesh of simple, replicated tiles connected by an interconnect which supports fast, static near-neighbor communication. Like all other resources, memory is distributed across the tiles. Management of the memory can be performed by well known techniques which generate the requisite communication code on distributed address-space architectures. On the other hand, the fast, static network provides the compiler with a simple interface to optimize such communication. This paper addresses the problem of taking advantage of such static communication for memory accesses. The requirement for static memory communication is the compile-time knowledge of the exact communication required for each memory reference. This knowledge, in turn, can be obtained if a memory reference refers exclusively to memory residing on a single processing tile. We introduce modulo unrolling as a technique which allows the static communication of a large class of array accesses. We show how this technique achieves the goal of static communication by using a relatively small unroll factor. For a set of dense matrix scientific applications, we are able to access all the array references on the static network, enabling scalable speedups on the Raw machine.","PeriodicalId":175528,"journal":{"name":"Proceedings. Fifth International Conference on High Performance Computing (Cat. No. 98EX238)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131623382","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 25

A general distributed event model 一个通用的分布式事件模型

Proceedings. Fifth International Conference on High Performance Computing (Cat. No. 98EX238) Pub Date : 1998-12-17 DOI: 10.1109/HIPC.1998.737979

K. Chandy, R. Ginis, E. Schooler

引用次数: 0

Efficient retrieval of multidimensional datasets through parallel I/O 通过并行I/O实现多维数据集的高效检索

Proceedings. Fifth International Conference on High Performance Computing (Cat. No. 98EX238) Pub Date : 1998-12-17 DOI: 10.1109/HIPC.1998.738011

Sunil Prabhakar, K. Abdel-Ghaffar, D. Agrawal, A. E. Abbadi

{"title":"Efficient retrieval of multidimensional datasets through parallel I/O","authors":"Sunil Prabhakar, K. Abdel-Ghaffar, D. Agrawal, A. E. Abbadi","doi":"10.1109/HIPC.1998.738011","DOIUrl":"https://doi.org/10.1109/HIPC.1998.738011","url":null,"abstract":"Many scientific and engineering applications process large multidimensional datasets. An important access pattern for these applications is the retrieval of data corresponding to ranges of values in multiple dimensions. Performance is limited by disk largely due to high disk latencies. Tiling and distributing the data across multiple disks is an effective technique for improving performance through parallel I/O. The distribution of tiles across the disks is an important factor in achieving gains. Several schemes for declustering multidimensional data to improve the performance of range queries have been proposed in the literature. We extend the class of cyclic schemes which have been developed earlier for two-dimensional data to multiple dimensions. We establish important properties of cyclic schemes, based upon which we reduce the search space for determining good declustering schemes within the class of cyclic schemes. Through experimental evaluation, we establish that the cyclic schemes are superior to other declustering schemes, including the state-of-the-art, both in terms of the degree of parallelism and robustness.","PeriodicalId":175528,"journal":{"name":"Proceedings. Fifth International Conference on High Performance Computing (Cat. No. 98EX238)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121525190","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 27

More on arbitrary boundary packed arithmetic 更多关于任意边界填充算法

Proceedings. Fifth International Conference on High Performance Computing (Cat. No. 98EX238) Pub Date : 1998-12-17 DOI: 10.1109/HIPC.1998.737966

P. Karthikeyan, P. Ranganathan

引用次数: 2

Exploiting image processing locality in cache pre-fetching 在缓存预取中利用图像处理局部性

Proceedings. Fifth International Conference on High Performance Computing (Cat. No. 98EX238) Pub Date : 1998-12-17 DOI: 10.1109/HIPC.1998.738023

R. Cucchiara, M. Piccardi

引用次数: 11