Proceedings. Fifth International Conference on High Performance Computing (Cat. No. 98EX238)最新文献

筛选
英文 中文
Improving error bounds for multipole-based treecodes 改进基于多极的树码的错误边界
A. Grama, V. Sarin, A. Sameh
{"title":"Improving error bounds for multipole-based treecodes","authors":"A. Grama, V. Sarin, A. Sameh","doi":"10.1109/HIPC.1998.737973","DOIUrl":"https://doi.org/10.1109/HIPC.1998.737973","url":null,"abstract":"Rapid evaluation of potentials in particle systems is an important and time-consuming step in many physical simulations. Over the past decade (1988-98), the development of treecodes such as the Fast Multipole Method (FMM) and the Barnes-Hut method has enabled large scale simulations in domains such as astrophysics, molecular dynamics, and material science. FMM and related methods rely on fixed degree polynomial (p) approximations of the potential of a set of points in a hierarchy. We present a sequence of results to illustrate that keeping the multipole degree constant can lead to large aggregate errors. An alternate strategy based on a careful selection of the multipole degree leads to asymptotically lower errors; while incurring minimal computation overhead for practical problem sizes. The paper presents theoretical results for computing the degree of a particle cluster interaction, the error associated with the interaction, the error associated with a particle for all of its interactions, and the computational complexity of the new method. These results show that it is possible to reduce the simulation error asymptotically while incurring minimal computational overhead. The paper also presents experimental validation of these results on a 32 processor Origin 2000 in the context of problems ranging from astrophysics to boundary element solvers. In addition to verifying theoretical results, we also show that it is possible to achieve excellent parallel speedup for the treecode.","PeriodicalId":175528,"journal":{"name":"Proceedings. Fifth International Conference on High Performance Computing (Cat. No. 98EX238)","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131276506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Implementing a parallel list on the SB-PRAM 在SB-PRAM上实现并行列表
A. Paul, J. Röhrig
{"title":"Implementing a parallel list on the SB-PRAM","authors":"A. Paul, J. Röhrig","doi":"10.1109/HIPC.1998.737970","DOIUrl":"https://doi.org/10.1109/HIPC.1998.737970","url":null,"abstract":"We give a description of a C++ implementation of a dynamic parallel list developed for the SB-PRAM, a massively parallel scalable shared memory computer. We show that access time on the elements stored in the parallel list is comparable with that of a sequential list. The implementation can easily be ported to other shared memory platforms supporting fast locking mechanisms and parallel prefix operations.","PeriodicalId":175528,"journal":{"name":"Proceedings. Fifth International Conference on High Performance Computing (Cat. No. 98EX238)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130969977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
An efficient implementation of a progressive image transmission system using successive pruning algorithm on a parallel architecture 在并行结构上采用连续剪枝算法的渐进图像传输系统的有效实现
S. Venkatesh, S. Srinivasan, Ray Chen
{"title":"An efficient implementation of a progressive image transmission system using successive pruning algorithm on a parallel architecture","authors":"S. Venkatesh, S. Srinivasan, Ray Chen","doi":"10.1109/HIPC.1998.738020","DOIUrl":"https://doi.org/10.1109/HIPC.1998.738020","url":null,"abstract":"Presented in this paper is an implementation using a combination of a successive pruning algorithm and a parallel architecture using a digital signal processor and a general purpose processor for progressive transmission of still images. The adaptive pruning algorithm is used for ensuring a minimum quality of the image while further progressions on the image are computed using a modified successive pruning algorithm. Due to the reduction in computations achieved using successive pruning and efficient splitting of tasks between the two processors, the speed of the entire operation is found to increase by a factor of 2.8 compared to the normal implementation of the progressive mode of JPEG.","PeriodicalId":175528,"journal":{"name":"Proceedings. Fifth International Conference on High Performance Computing (Cat. No. 98EX238)","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134053917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Hierarchical architecture for parallel query processing on networks of workstations 面向工作站网络并行查询处理的层次结构
Boquan Xie, S. Dandamudi
{"title":"Hierarchical architecture for parallel query processing on networks of workstations","authors":"Boquan Xie, S. Dandamudi","doi":"10.1109/HIPC.1998.738008","DOIUrl":"https://doi.org/10.1109/HIPC.1998.738008","url":null,"abstract":"Networks of workstations (NOWs) are cost-effective alternatives to multiprocessor systems. Recently, NOWs have been proposed for parallel query processing. Idle CPU cycles of workstations in a NOW-based system can be used to process database query operations. We report on the performance of the hierarchical architecture for parallel query processing on a NOW. We have implemented the hierarchical architecture using PVM on a Pentium-based NOW. The experimental results reported suggest that the hierarchical architecture is successful in achieving good scale-ups and speedups indicating that the idle processor cycles are effectively used for query processing. The hierarchical system can also handle both light and heavy workloads in a load sharing fashion. Our results also suggest that the performance is sensitive to the minimum fragmentation size (chunk size) for partial operations and the structure of queries.","PeriodicalId":175528,"journal":{"name":"Proceedings. Fifth International Conference on High Performance Computing (Cat. No. 98EX238)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131988503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Multiple token distributed loop local area networks: analysis 多令牌分布式环路局域网:分析
N. Chalamaiah, R. Badrinath
{"title":"Multiple token distributed loop local area networks: analysis","authors":"N. Chalamaiah, R. Badrinath","doi":"10.1109/HIPC.1998.738014","DOIUrl":"https://doi.org/10.1109/HIPC.1998.738014","url":null,"abstract":"With increased data rates, the packet transmission time of a LAN could approach or even become less than the medium propagation delay. The performance of many LAN schemes degrades rapidly under these conditions. Generally, the overhead associated with the medium access protocol increases with the increase in propagation time relative to packet transmission time. In token ring networks this overhead depends on the round trip propagation delay of the channel. Several schemes, such as multiple rings (with multiple channels) and multiple access points (with multiple tokens) are proposed to decrease this overhead. In these schemes analytical and simulation results have shown improved performance. In the present paper we propose a distributed multiconnected loop topology with multiple tokens. We also present analytical results showing the packet delay performance. Finally we compare the performance of distributed multiconnected loops with multiple ring topology in terms of media access control, token coalescence and delay.","PeriodicalId":175528,"journal":{"name":"Proceedings. Fifth International Conference on High Performance Computing (Cat. No. 98EX238)","volume":"61 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130950932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Memory bank disambiguation using modulo unrolling for Raw machines 基于模展开的原始机器内存库消歧
R. Barua, Walter Lee, Saman P. Amarasinghe, A. Agarwal
{"title":"Memory bank disambiguation using modulo unrolling for Raw machines","authors":"R. Barua, Walter Lee, Saman P. Amarasinghe, A. Agarwal","doi":"10.1109/HIPC.1998.737991","DOIUrl":"https://doi.org/10.1109/HIPC.1998.737991","url":null,"abstract":"We present modulo unrolling, a code transformation technique for enabling array references to be accessed through the fast static network on a Raw machine. A Raw machine comprises of a mesh of simple, replicated tiles connected by an interconnect which supports fast, static near-neighbor communication. Like all other resources, memory is distributed across the tiles. Management of the memory can be performed by well known techniques which generate the requisite communication code on distributed address-space architectures. On the other hand, the fast, static network provides the compiler with a simple interface to optimize such communication. This paper addresses the problem of taking advantage of such static communication for memory accesses. The requirement for static memory communication is the compile-time knowledge of the exact communication required for each memory reference. This knowledge, in turn, can be obtained if a memory reference refers exclusively to memory residing on a single processing tile. We introduce modulo unrolling as a technique which allows the static communication of a large class of array accesses. We show how this technique achieves the goal of static communication by using a relatively small unroll factor. For a set of dense matrix scientific applications, we are able to access all the array references on the static network, enabling scalable speedups on the Raw machine.","PeriodicalId":175528,"journal":{"name":"Proceedings. Fifth International Conference on High Performance Computing (Cat. No. 98EX238)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131623382","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
A general distributed event model 一个通用的分布式事件模型
K. Chandy, R. Ginis, E. Schooler
{"title":"A general distributed event model","authors":"K. Chandy, R. Ginis, E. Schooler","doi":"10.1109/HIPC.1998.737979","DOIUrl":"https://doi.org/10.1109/HIPC.1998.737979","url":null,"abstract":"This paper identifies some of the issues that need to be explored in systems that support content-based delivery. Distributed systems today are based on address-based delivery of messages. Each message carries the unique addresses of the intended recipients. The idea in content-based delivery is that information is delivered to those objects that have subscribed for that information. The ideas on content-based delivery reported are based partly on Roman Ginis' doctoral dissertation at Caltech. The ideas on directories reported are based partly on Eve Schooler's doctoral dissertation at Caltech.","PeriodicalId":175528,"journal":{"name":"Proceedings. Fifth International Conference on High Performance Computing (Cat. No. 98EX238)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123932048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient retrieval of multidimensional datasets through parallel I/O 通过并行I/O实现多维数据集的高效检索
Sunil Prabhakar, K. Abdel-Ghaffar, D. Agrawal, A. E. Abbadi
{"title":"Efficient retrieval of multidimensional datasets through parallel I/O","authors":"Sunil Prabhakar, K. Abdel-Ghaffar, D. Agrawal, A. E. Abbadi","doi":"10.1109/HIPC.1998.738011","DOIUrl":"https://doi.org/10.1109/HIPC.1998.738011","url":null,"abstract":"Many scientific and engineering applications process large multidimensional datasets. An important access pattern for these applications is the retrieval of data corresponding to ranges of values in multiple dimensions. Performance is limited by disk largely due to high disk latencies. Tiling and distributing the data across multiple disks is an effective technique for improving performance through parallel I/O. The distribution of tiles across the disks is an important factor in achieving gains. Several schemes for declustering multidimensional data to improve the performance of range queries have been proposed in the literature. We extend the class of cyclic schemes which have been developed earlier for two-dimensional data to multiple dimensions. We establish important properties of cyclic schemes, based upon which we reduce the search space for determining good declustering schemes within the class of cyclic schemes. Through experimental evaluation, we establish that the cyclic schemes are superior to other declustering schemes, including the state-of-the-art, both in terms of the degree of parallelism and robustness.","PeriodicalId":175528,"journal":{"name":"Proceedings. Fifth International Conference on High Performance Computing (Cat. No. 98EX238)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121525190","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
More on arbitrary boundary packed arithmetic 更多关于任意边界填充算法
P. Karthikeyan, P. Ranganathan
{"title":"More on arbitrary boundary packed arithmetic","authors":"P. Karthikeyan, P. Ranganathan","doi":"10.1109/HIPC.1998.737966","DOIUrl":"https://doi.org/10.1109/HIPC.1998.737966","url":null,"abstract":"Recent microprocessors have been enhanced with media instruction sets for accelerating media algorithms. They exploit the fact that media algorithms have small data types, and widths much less than that of the processor. Current media instruction sets support only 8-, 16- and 32-bit sub-datatypes. This scheme is inefficient in several applications where bit lengths of 9, 12 and so on are used. We need user programmable sub-datatype bit lengths. S. Balakrishnan and S.K. Nandy (1998) discuss arbitrary boundary packed addition. Many media algorithms are based on multiply-accumulate algorithms. For full acceleration we also need arbitrary boundary packed multiplication. We present such a scheme based on Wallace tree multiplication. We also expand on Balakrishnan and Nandy and provide a detailed treatment of the intermediate carries of sub-datatypes which were lost in the previous work. These carries could be used for saturation arithmetic and flow control.","PeriodicalId":175528,"journal":{"name":"Proceedings. Fifth International Conference on High Performance Computing (Cat. No. 98EX238)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121845203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Exploiting image processing locality in cache pre-fetching 在缓存预取中利用图像处理局部性
R. Cucchiara, M. Piccardi
{"title":"Exploiting image processing locality in cache pre-fetching","authors":"R. Cucchiara, M. Piccardi","doi":"10.1109/HIPC.1998.738023","DOIUrl":"https://doi.org/10.1109/HIPC.1998.738023","url":null,"abstract":"Emerging trends in computer design attempt to include specific solutions for handling images also in general-purpose computers, because of the current spread of multimedia, image processing and computer graphics applications. In this context, we propose hardware pre-fetching techniques specific for caching images: the main issue we state is that most algorithms working on images exhibit a 2D spatial locality that is not taken into account in current cache organization and data access strategies. To this aim we propose an adaptive local pre-fetching for the image data type; this technique, mirroring the two-dimensional spatial locality of image processing algorithms, results in being more efficient than other approaches, such as sequential pre-fetching and adaptive pre-fetching. Performance is evaluated on different classes of image processing algorithms, namely raster-scan and propagative algorithms, common in computer vision and multimedia applications.","PeriodicalId":175528,"journal":{"name":"Proceedings. Fifth International Conference on High Performance Computing (Cat. No. 98EX238)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116966263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信