Proceedings of the IEEE/ACM SC98 Conference最新文献_第5页

The Large Scale Parallelization of a Conformational 3D Protein Structure Prediction Application 三维构象蛋白质结构预测的大规模并行化应用

Proceedings of the IEEE/ACM SC98 Conference Pub Date : 1998-11-07 DOI: 10.1109/SC.1998.10022

P. LoCascio, K. Yue, P. Cummings, K. Dill

引用次数: 0

High Performance Multidimensional Analysis and Data Mining 高性能多维分析和数据挖掘

Proceedings of the IEEE/ACM SC98 Conference Pub Date : 1998-11-07 DOI: 10.1109/SC.1998.10043

Sanjay Goil, A. Choudhary

{"title":"High Performance Multidimensional Analysis and Data Mining","authors":"Sanjay Goil, A. Choudhary","doi":"10.1109/SC.1998.10043","DOIUrl":"https://doi.org/10.1109/SC.1998.10043","url":null,"abstract":"Summary information from data in large databases is used to answer queries in On-Line Analytical Processing (OLAP) systems and to build decision support systems over them. The Data Cube is used to calculate and store summary information on a variety of dimensions, which is computed only partially if the number of dimensions is large. Queries posed on such systems are quite complex and require different views of data. These may either be answered from a materialized cube in the data cube or calculated on the fly. Further, data mining for associations can be performed on the data cube. Analytical models need to capture the multidimensionality of the underlying data, a task for which multidimensional databases are well suited. Also, they are amenable to parallelism, which is necessary to deal with large (and still growing) data sets. Multidimensional databases store data in multidimensional structure on which analytical operations are performed. A challenge for these systems is how to handle large data sets in a large number of dimensions. These techniques are also applicable to scientific and statistical databases (SSDB) which employ large multidimensional databases and dimensional operations over them. In this paper we present (1) A parallel infrastructure for OLAP multidimensional databases integrated with association rule mining. (2) Introduce Bit-Encoded Sparse Structure (BESS) for sparse data storage in chunks. (3) Scheduling optimizations for parallel computation of complete and partial data cubes. (4) Implementation a large scale multidimensional database engine suitable for dimensional analysis used in OLAP and SSDB for (a) large number of dimensions (20-30) (b) large data sets (10s of Gigabyte) Our implementation on the IBM SP-2 can handle large data sets and a large number of dimensions by using disk I/O. Results are presented showing its performance and scalability.","PeriodicalId":113978,"journal":{"name":"Proceedings of the IEEE/ACM SC98 Conference","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117000288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 29

Application of a High Performance Parallel Eigensolver to Electronic Structure Calculations 一种高性能并行特征求解器在电子结构计算中的应用

Proceedings of the IEEE/ACM SC98 Conference Pub Date : 1998-11-07 DOI: 10.1109/SC.1998.10037

M. P. Sears, K. Stanley, G. Henry

引用次数: 18

Analyzing the Error Bounds of Multipole-Based Treecodes 多极树码的误差界分析

Proceedings of the IEEE/ACM SC98 Conference Pub Date : 1998-11-07 DOI: 10.1109/SC.1998.10041

V. Sarin, A. Grama, A. Sameh

{"title":"Analyzing the Error Bounds of Multipole-Based Treecodes","authors":"V. Sarin, A. Grama, A. Sameh","doi":"10.1109/SC.1998.10041","DOIUrl":"https://doi.org/10.1109/SC.1998.10041","url":null,"abstract":"Abstract: The problem of evaluating the potential due to a set of particles is an important and time- consuming one. The development of fast treecodes such as the Barnes-Hut and Fast Multipole Methods for n-body systems has enabled large scale simulations in astrophysics [9, 10, 13] and molecular dynamics [1]. Coupled with efficient parallel processing, these treecodes are capable of yielding several orders of magnitude improvement in performance [6, 14, 15]. In addition, treecodes have applications in the solution of dense linear systems arising from boundary element methods [3, 4, 5, 11, 12]. Using a p-term multipole expansion, the FMM reduces the complexity of a single timestep from O(n2) to O(p2n) and Barnes-Hut method reduces it to O(p2log n) for a uniform distribution. In this paper, we analyze the approximations introduced by these methods. We describe an algorithm that reduces the error significantly by selecting the multipole degree appropriately for different clusters. Furthermore, we show that for practical problem sizes, this increases the computational complexity marginally. We support our theoretical result with experiments in the context of particle simulations as well as boundary element methods. Our POSIX threads-based treecode yields excellent speedups on a 32 processor SGI Origin 2000, even for relatively small problems.","PeriodicalId":113978,"journal":{"name":"Proceedings of the IEEE/ACM SC98 Conference","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121678216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

S-HARP: A Scalable Parallel Dynamic Partitioner for Adaptive Mesh-based Computations S-HARP:用于自适应网格计算的可扩展并行动态分区器

Proceedings of the IEEE/ACM SC98 Conference Pub Date : 1998-11-07 DOI: 10.1109/SC.1998.10023

A. Sohn, H. Simon

{"title":"S-HARP: A Scalable Parallel Dynamic Partitioner for Adaptive Mesh-based Computations","authors":"A. Sohn, H. Simon","doi":"10.1109/SC.1998.10023","DOIUrl":"https://doi.org/10.1109/SC.1998.10023","url":null,"abstract":"Computational science problems with adaptive meshes involve dynamic load balancing when implemented on parallel machines. This dynamic load balancing requires fast partitioning of computational meshes at run time. We present in this report a scalable parallel dynamic partitioner, called S-HARP. The underlying principles of S-HARP are the fast feature of inertial partitioning and the quality feature of spectral partitioning. S-HARP is a universal dynamic partitioner with three distinctive features: (a) fast partitioning from scratch with a global view, requiring no information from the previous iterations, (b) no restriction on the issue of one partition per processor, (c) no imbalance factor issue because of precise bisection using sorting. Two types of parallelism have been exploited in S-HARP, fine-grain loop-level parallelism and coarse-grain recursive parallelism. The parallel partitioner has been implemented in Message Passing Interface on Cray T3E and IBM SP2 for portability. Experimental results indicate that S-HARP can partition a mesh of over 100,000 vertices into 256 partitions in 0.18 seconds on a 64-processor Cray T3E. S-HARP is much more scalable than other dynamic partitioners, giving over 17-fold speedup on 64 processors while ParaMeTiS1.0 gives a few-fold speedup. Experimental results demonstrate that S-HARP is three to 15 times faster than the other dynamic partitioners on computational meshes of size over 100,000 vertices while giving comparable edge cuts.","PeriodicalId":113978,"journal":{"name":"Proceedings of the IEEE/ACM SC98 Conference","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115811871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

A Scalable and Highly Available System for Serving Dynamic Data at Frequently Accessed Web Sites 为频繁访问的网站提供动态数据的可伸缩和高可用性系统

Proceedings of the IEEE/ACM SC98 Conference Pub Date : 1998-11-07 DOI: 10.1109/SC.1998.10044

J. Challenger, P. Dantzig, A. Iyengar

{"title":"A Scalable and Highly Available System for Serving Dynamic Data at Frequently Accessed Web Sites","authors":"J. Challenger, P. Dantzig, A. Iyengar","doi":"10.1109/SC.1998.10044","DOIUrl":"https://doi.org/10.1109/SC.1998.10044","url":null,"abstract":"This paper describes the system and key techniques used for achieving performance and high availability at the official Web site for the 1998 Olympic Winter Games which was one of the most popular Web sites for the duration of the Olympic Games. The Web site utilized thirteen SP2 systems scattered around the globe containing a total of 143 processors. A key feature of the Web site was that the data being presented to clients was constantly changing. Whenever new results were entered into the system, updated Web pages reflecting the changes were made available to the rest of the world within seconds. One technique we used to serve dynamic data efficiently to clients was to cache dynamic pages so that they only had to be generated once. We developed and implemented a new algorithm we call Data Update Propagation (DUP) which identifies the cached pages that have become stale as a result of changes to underlying data on which the cached pages depend, such as databases. For the Olympic Games Web site, we were able to update stale pages directly in the cache which obviated the need to invalidate them. This allowed us to achieve cache hit rates of close to 100%. Our system was able to serve pages to clients quickly during the entire Olympic Games even during peak periods. In addition, the site was available 100% of the time. We describe the keyfeatures employed by our site for high availability. We also describe how the Web site was structured to provide useful information while requiring clients to examine only a small number of pages.","PeriodicalId":113978,"journal":{"name":"Proceedings of the IEEE/ACM SC98 Conference","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124769604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 87

Highly Efficient Gang Scheduling Implementation 高效的组调度实现

Proceedings of the IEEE/ACM SC98 Conference Pub Date : 1998-11-07 DOI: 10.1109/SC.1998.10007

A. Hori, H. Tezuka, Y. Ishikawa

{"title":"Highly Efficient Gang Scheduling Implementation","authors":"A. Hori, H. Tezuka, Y. Ishikawa","doi":"10.1109/SC.1998.10007","DOIUrl":"https://doi.org/10.1109/SC.1998.10007","url":null,"abstract":"A new and more highly efficient gang scheduling implementation technique is the basis for this paper. Network preemption, in which network interface contexts are saved and restored, has already been proposed to enable parallel applications to perform efficent user-level communication. This network preemption technique can be used to for detecting global state, such as deadlock, of a parallel program execution. A gang scheduler, SCore-D, using the network preemption technique is implemented with PM, a user-level communication library. This paper evaluates network preemption gang scheduling overhead using eight NAS parallel benchmark programs. The results of this evaluation illustrate that the saving and restoring network contexts occupies almost half of the total gang scheduling overhead. A new mechanism, having multiple network contexts and merely switching the context pointers without saving and restoring the network contexts, is proposed. The NAS parallel benchmark evaluation shows that gang scheduling overhead is almost halved. The maximum gang scheduling overhead among benchmark programs is less than 10%, with a 40msec time slice on 64 single-way PentiumPros, connected by Myrinet to form a PC cluster. The numbers of secondary cache misses are counted, and it is found that network preemption with multiple network contexts is more cache-effective than a single network context. The observed scheduling overhead for applications running on 64 nodes can only be a small percent of the execution time. The gang scheduling overheads of switching two NAS parallel benchmark programs are also evaluated. The additional overheads are less than 2% in most cases, with a 100msec time slice on 64 nodes. This slightly higher scheduling overheads than for switching a single parallel process comes from more frequent cache misses. This paper contributes the following findings; i) gang scheduling overhead with network preemption can be sufficiently low, ii) proposed network preemption with multiple network contexts is more cache-effective than a single network context, and, iii) network preemption can be applied to detect global states of user parallel processes. SCore-D gang scheduler realized by network preemption can utilize processor resources by the detecting the global state of user parallel processes. Network preemption with multiple contexts exhibits highly efficient gang scheduling. The combination of low scheduling overhead and the global state detection mechanism achieves an interactive parallel programming where parallel program development and the production run of parallel programs can be mixed freely.","PeriodicalId":113978,"journal":{"name":"Proceedings of the IEEE/ACM SC98 Conference","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124362742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 37

An Initial Evaluation of the Tera Multithreaded Architecture and Programming System Using the C3I Parallel Benchmark Suite 基于C3I并行基准套件的Tera多线程架构和编程系统的初步评估

Proceedings of the IEEE/ACM SC98 Conference Pub Date : 1998-11-07 DOI: 10.1109/SC.1998.10048

S. Brunett, J. Thornley, M. Ellenbecker

{"title":"An Initial Evaluation of the Tera Multithreaded Architecture and Programming System Using the C3I Parallel Benchmark Suite","authors":"S. Brunett, J. Thornley, M. Ellenbecker","doi":"10.1109/SC.1998.10048","DOIUrl":"https://doi.org/10.1109/SC.1998.10048","url":null,"abstract":"The Tera Multithreaded Architecture (MTA) is a radical new architecture intended to revolutionize high-performance computing in both the scientific and commercial marketplaces. Each processor supports 128 threads in hardware. Extremely fast thread switching is used to mask latency in a uniform-access memory system without caching. It is claimed that these hardware characteristics allow compilers to easily transform sequential programs into efficient multithreaded programs for the Tera MTA. In this paper, we attempt to provide an objective initial evaluation of the performance of the Tera multithreaded architecture and programming system for general-purpose applications. The basis of our investigation is two programs from the C3I Parallel Benchmark Suite (C3IPBS). Both these programs have previously been shown to have the potential for large-scale parallelization. We compare the performance of these programs on (i) a fast uniprocessor, (ii) two conventional shared-memory multiprocessors, and (iii) the first installed Tera MTA (at the San Diego Supercomputer Center). On these platforms, we compare the effectiveness of both automatic and manual parallelization.","PeriodicalId":113978,"journal":{"name":"Proceedings of the IEEE/ACM SC98 Conference","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134308480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

A Parallel Linear Algebra Server for Matlab-like Environments 一个用于类matlab环境的并行线性代数服务器

Proceedings of the IEEE/ACM SC98 Conference Pub Date : 1998-11-07 DOI: 10.1109/SC.1998.10015

G. Morrow, R. V. D. Geijn

引用次数: 14

Automatically Tuned Linear Algebra Software 自动调谐线性代数软件

Proceedings of the IEEE/ACM SC98 Conference Pub Date : 1998-11-07 DOI: 10.1109/SC.1998.10004

R. C. Whaley, Jack J. Dongarra

引用次数: 1170