Proceedings of the IEEE/ACM SC98 Conference最新文献

筛选
英文 中文
The Large Scale Parallelization of a Conformational 3D Protein Structure Prediction Application 三维构象蛋白质结构预测的大规模并行化应用
Proceedings of the IEEE/ACM SC98 Conference Pub Date : 1998-11-07 DOI: 10.1109/SC.1998.10022
P. LoCascio, K. Yue, P. Cummings, K. Dill
{"title":"The Large Scale Parallelization of a Conformational 3D Protein Structure Prediction Application","authors":"P. LoCascio, K. Yue, P. Cummings, K. Dill","doi":"10.1109/SC.1998.10022","DOIUrl":"https://doi.org/10.1109/SC.1998.10022","url":null,"abstract":"We present here the design strategy and performance analysis of a large scale scientific application, for the prediction of 3D Protein structures. The unique challenges which will be investigated are the primary objectives of a reduction in wall clock run time through the parallelization process, and the production of an application capable of running and scaling to a massively parallel configuration (currently 1024 nodes of the Intel Paragon) reliably for many non-contiguous days of supercomputer time. Enough flexibility to be reconfigured for a number of different parallel architectures including the CRAY T3E and IBM SP2 was included through the use of MPI as the parallel software layer. The application, GEOCORE, predicts small ensembles of native-like peptide conformations from amino acid sequences. GEOCORE uses a very simple energy function and an extensive conformational search process. The serial program has been tested on around 20 small peptides and is shown to be capable of discriminating native from non-native structures.","PeriodicalId":113978,"journal":{"name":"Proceedings of the IEEE/ACM SC98 Conference","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122319005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
High Performance Multidimensional Analysis and Data Mining 高性能多维分析和数据挖掘
Proceedings of the IEEE/ACM SC98 Conference Pub Date : 1998-11-07 DOI: 10.1109/SC.1998.10043
Sanjay Goil, A. Choudhary
{"title":"High Performance Multidimensional Analysis and Data Mining","authors":"Sanjay Goil, A. Choudhary","doi":"10.1109/SC.1998.10043","DOIUrl":"https://doi.org/10.1109/SC.1998.10043","url":null,"abstract":"Summary information from data in large databases is used to answer queries in On-Line Analytical Processing (OLAP) systems and to build decision support systems over them. The Data Cube is used to calculate and store summary information on a variety of dimensions, which is computed only partially if the number of dimensions is large. Queries posed on such systems are quite complex and require different views of data. These may either be answered from a materialized cube in the data cube or calculated on the fly. Further, data mining for associations can be performed on the data cube. Analytical models need to capture the multidimensionality of the underlying data, a task for which multidimensional databases are well suited. Also, they are amenable to parallelism, which is necessary to deal with large (and still growing) data sets. Multidimensional databases store data in multidimensional structure on which analytical operations are performed. A challenge for these systems is how to handle large data sets in a large number of dimensions. These techniques are also applicable to scientific and statistical databases (SSDB) which employ large multidimensional databases and dimensional operations over them. In this paper we present (1) A parallel infrastructure for OLAP multidimensional databases integrated with association rule mining. (2) Introduce Bit-Encoded Sparse Structure (BESS) for sparse data storage in chunks. (3) Scheduling optimizations for parallel computation of complete and partial data cubes. (4) Implementation a large scale multidimensional database engine suitable for dimensional analysis used in OLAP and SSDB for (a) large number of dimensions (20-30) (b) large data sets (10s of Gigabyte) Our implementation on the IBM SP-2 can handle large data sets and a large number of dimensions by using disk I/O. Results are presented showing its performance and scalability.","PeriodicalId":113978,"journal":{"name":"Proceedings of the IEEE/ACM SC98 Conference","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117000288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
Application of a High Performance Parallel Eigensolver to Electronic Structure Calculations 一种高性能并行特征求解器在电子结构计算中的应用
Proceedings of the IEEE/ACM SC98 Conference Pub Date : 1998-11-07 DOI: 10.1109/SC.1998.10037
M. P. Sears, K. Stanley, G. Henry
{"title":"Application of a High Performance Parallel Eigensolver to Electronic Structure Calculations","authors":"M. P. Sears, K. Stanley, G. Henry","doi":"10.1109/SC.1998.10037","DOIUrl":"https://doi.org/10.1109/SC.1998.10037","url":null,"abstract":"In this paper we report the development of a very high performance parallel eigensolver based on the portable ScaLAPACK library, and its application to electronic structure calculations in the MP-Quest code. This work was done on ASCI-Red, a supercomputer based on over 4600 dual-processor Pentium Pro1 nodes at Sandia National Laboratories. We report sustained performance in the code of 605GFlops and peak performance in the eigensolver of 684GFlops. This is comparable to performance obtained from MP-Linpack on a similar sized problem. For a smaller problem we have sustained performance of 420GFlops in the application and peak performance in the eigensolver of 563GFlops. Impact of this work on the specific application is important, but the development of significant improvements to a portable eigensolver and other libraries will also benefit a number of applications.","PeriodicalId":113978,"journal":{"name":"Proceedings of the IEEE/ACM SC98 Conference","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114400953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Analyzing the Error Bounds of Multipole-Based Treecodes 多极树码的误差界分析
Proceedings of the IEEE/ACM SC98 Conference Pub Date : 1998-11-07 DOI: 10.1109/SC.1998.10041
V. Sarin, A. Grama, A. Sameh
{"title":"Analyzing the Error Bounds of Multipole-Based Treecodes","authors":"V. Sarin, A. Grama, A. Sameh","doi":"10.1109/SC.1998.10041","DOIUrl":"https://doi.org/10.1109/SC.1998.10041","url":null,"abstract":"Abstract: The problem of evaluating the potential due to a set of particles is an important and time- consuming one. The development of fast treecodes such as the Barnes-Hut and Fast Multipole Methods for n-body systems has enabled large scale simulations in astrophysics [9, 10, 13] and molecular dynamics [1]. Coupled with efficient parallel processing, these treecodes are capable of yielding several orders of magnitude improvement in performance [6, 14, 15]. In addition, treecodes have applications in the solution of dense linear systems arising from boundary element methods [3, 4, 5, 11, 12]. Using a p-term multipole expansion, the FMM reduces the complexity of a single timestep from O(n2) to O(p2n) and Barnes-Hut method reduces it to O(p2log n) for a uniform distribution. In this paper, we analyze the approximations introduced by these methods. We describe an algorithm that reduces the error significantly by selecting the multipole degree appropriately for different clusters. Furthermore, we show that for practical problem sizes, this increases the computational complexity marginally. We support our theoretical result with experiments in the context of particle simulations as well as boundary element methods. Our POSIX threads-based treecode yields excellent speedups on a 32 processor SGI Origin 2000, even for relatively small problems.","PeriodicalId":113978,"journal":{"name":"Proceedings of the IEEE/ACM SC98 Conference","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121678216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
S-HARP: A Scalable Parallel Dynamic Partitioner for Adaptive Mesh-based Computations S-HARP:用于自适应网格计算的可扩展并行动态分区器
Proceedings of the IEEE/ACM SC98 Conference Pub Date : 1998-11-07 DOI: 10.1109/SC.1998.10023
A. Sohn, H. Simon
{"title":"S-HARP: A Scalable Parallel Dynamic Partitioner for Adaptive Mesh-based Computations","authors":"A. Sohn, H. Simon","doi":"10.1109/SC.1998.10023","DOIUrl":"https://doi.org/10.1109/SC.1998.10023","url":null,"abstract":"Computational science problems with adaptive meshes involve dynamic load balancing when implemented on parallel machines. This dynamic load balancing requires fast partitioning of computational meshes at run time. We present in this report a scalable parallel dynamic partitioner, called S-HARP. The underlying principles of S-HARP are the fast feature of inertial partitioning and the quality feature of spectral partitioning. S-HARP is a universal dynamic partitioner with three distinctive features: (a) fast partitioning from scratch with a global view, requiring no information from the previous iterations, (b) no restriction on the issue of one partition per processor, (c) no imbalance factor issue because of precise bisection using sorting. Two types of parallelism have been exploited in S-HARP, fine-grain loop-level parallelism and coarse-grain recursive parallelism. The parallel partitioner has been implemented in Message Passing Interface on Cray T3E and IBM SP2 for portability. Experimental results indicate that S-HARP can partition a mesh of over 100,000 vertices into 256 partitions in 0.18 seconds on a 64-processor Cray T3E. S-HARP is much more scalable than other dynamic partitioners, giving over 17-fold speedup on 64 processors while ParaMeTiS1.0 gives a few-fold speedup. Experimental results demonstrate that S-HARP is three to 15 times faster than the other dynamic partitioners on computational meshes of size over 100,000 vertices while giving comparable edge cuts.","PeriodicalId":113978,"journal":{"name":"Proceedings of the IEEE/ACM SC98 Conference","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115811871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
A Scalable and Highly Available System for Serving Dynamic Data at Frequently Accessed Web Sites 为频繁访问的网站提供动态数据的可伸缩和高可用性系统
Proceedings of the IEEE/ACM SC98 Conference Pub Date : 1998-11-07 DOI: 10.1109/SC.1998.10044
J. Challenger, P. Dantzig, A. Iyengar
{"title":"A Scalable and Highly Available System for Serving Dynamic Data at Frequently Accessed Web Sites","authors":"J. Challenger, P. Dantzig, A. Iyengar","doi":"10.1109/SC.1998.10044","DOIUrl":"https://doi.org/10.1109/SC.1998.10044","url":null,"abstract":"This paper describes the system and key techniques used for achieving performance and high availability at the official Web site for the 1998 Olympic Winter Games which was one of the most popular Web sites for the duration of the Olympic Games. The Web site utilized thirteen SP2 systems scattered around the globe containing a total of 143 processors. A key feature of the Web site was that the data being presented to clients was constantly changing. Whenever new results were entered into the system, updated Web pages reflecting the changes were made available to the rest of the world within seconds. One technique we used to serve dynamic data efficiently to clients was to cache dynamic pages so that they only had to be generated once. We developed and implemented a new algorithm we call Data Update Propagation (DUP) which identifies the cached pages that have become stale as a result of changes to underlying data on which the cached pages depend, such as databases. For the Olympic Games Web site, we were able to update stale pages directly in the cache which obviated the need to invalidate them. This allowed us to achieve cache hit rates of close to 100%. Our system was able to serve pages to clients quickly during the entire Olympic Games even during peak periods. In addition, the site was available 100% of the time. We describe the keyfeatures employed by our site for high availability. We also describe how the Web site was structured to provide useful information while requiring clients to examine only a small number of pages.","PeriodicalId":113978,"journal":{"name":"Proceedings of the IEEE/ACM SC98 Conference","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124769604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 87
Highly Efficient Gang Scheduling Implementation 高效的组调度实现
Proceedings of the IEEE/ACM SC98 Conference Pub Date : 1998-11-07 DOI: 10.1109/SC.1998.10007
A. Hori, H. Tezuka, Y. Ishikawa
{"title":"Highly Efficient Gang Scheduling Implementation","authors":"A. Hori, H. Tezuka, Y. Ishikawa","doi":"10.1109/SC.1998.10007","DOIUrl":"https://doi.org/10.1109/SC.1998.10007","url":null,"abstract":"A new and more highly efficient gang scheduling implementation technique is the basis for this paper. Network preemption, in which network interface contexts are saved and restored, has already been proposed to enable parallel applications to perform efficent user-level communication. This network preemption technique can be used to for detecting global state, such as deadlock, of a parallel program execution. A gang scheduler, SCore-D, using the network preemption technique is implemented with PM, a user-level communication library. This paper evaluates network preemption gang scheduling overhead using eight NAS parallel benchmark programs. The results of this evaluation illustrate that the saving and restoring network contexts occupies almost half of the total gang scheduling overhead. A new mechanism, having multiple network contexts and merely switching the context pointers without saving and restoring the network contexts, is proposed. The NAS parallel benchmark evaluation shows that gang scheduling overhead is almost halved. The maximum gang scheduling overhead among benchmark programs is less than 10%, with a 40msec time slice on 64 single-way PentiumPros, connected by Myrinet to form a PC cluster. The numbers of secondary cache misses are counted, and it is found that network preemption with multiple network contexts is more cache-effective than a single network context. The observed scheduling overhead for applications running on 64 nodes can only be a small percent of the execution time. The gang scheduling overheads of switching two NAS parallel benchmark programs are also evaluated. The additional overheads are less than 2% in most cases, with a 100msec time slice on 64 nodes. This slightly higher scheduling overheads than for switching a single parallel process comes from more frequent cache misses. This paper contributes the following findings; i) gang scheduling overhead with network preemption can be sufficiently low, ii) proposed network preemption with multiple network contexts is more cache-effective than a single network context, and, iii) network preemption can be applied to detect global states of user parallel processes. SCore-D gang scheduler realized by network preemption can utilize processor resources by the detecting the global state of user parallel processes. Network preemption with multiple contexts exhibits highly efficient gang scheduling. The combination of low scheduling overhead and the global state detection mechanism achieves an interactive parallel programming where parallel program development and the production run of parallel programs can be mixed freely.","PeriodicalId":113978,"journal":{"name":"Proceedings of the IEEE/ACM SC98 Conference","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124362742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 37
An Initial Evaluation of the Tera Multithreaded Architecture and Programming System Using the C3I Parallel Benchmark Suite 基于C3I并行基准套件的Tera多线程架构和编程系统的初步评估
Proceedings of the IEEE/ACM SC98 Conference Pub Date : 1998-11-07 DOI: 10.1109/SC.1998.10048
S. Brunett, J. Thornley, M. Ellenbecker
{"title":"An Initial Evaluation of the Tera Multithreaded Architecture and Programming System Using the C3I Parallel Benchmark Suite","authors":"S. Brunett, J. Thornley, M. Ellenbecker","doi":"10.1109/SC.1998.10048","DOIUrl":"https://doi.org/10.1109/SC.1998.10048","url":null,"abstract":"The Tera Multithreaded Architecture (MTA) is a radical new architecture intended to revolutionize high-performance computing in both the scientific and commercial marketplaces. Each processor supports 128 threads in hardware. Extremely fast thread switching is used to mask latency in a uniform-access memory system without caching. It is claimed that these hardware characteristics allow compilers to easily transform sequential programs into efficient multithreaded programs for the Tera MTA. In this paper, we attempt to provide an objective initial evaluation of the performance of the Tera multithreaded architecture and programming system for general-purpose applications. The basis of our investigation is two programs from the C3I Parallel Benchmark Suite (C3IPBS). Both these programs have previously been shown to have the potential for large-scale parallelization. We compare the performance of these programs on (i) a fast uniprocessor, (ii) two conventional shared-memory multiprocessors, and (iii) the first installed Tera MTA (at the San Diego Supercomputer Center). On these platforms, we compare the effectiveness of both automatic and manual parallelization.","PeriodicalId":113978,"journal":{"name":"Proceedings of the IEEE/ACM SC98 Conference","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134308480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
A Parallel Linear Algebra Server for Matlab-like Environments 一个用于类matlab环境的并行线性代数服务器
Proceedings of the IEEE/ACM SC98 Conference Pub Date : 1998-11-07 DOI: 10.1109/SC.1998.10015
G. Morrow, R. V. D. Geijn
{"title":"A Parallel Linear Algebra Server for Matlab-like Environments","authors":"G. Morrow, R. V. D. Geijn","doi":"10.1109/SC.1998.10015","DOIUrl":"https://doi.org/10.1109/SC.1998.10015","url":null,"abstract":"The PLAPACK Server Interface (PSI) is a parallel \"back-end\" for any of a number of interactive mathematics and visualization environments. From within an interactive session running on a workstation, the user can create and manipulate linear algebra objects (matrices, vectors and scalars) that reside on the parallel computer. The user can insert (and retrieve) matrix and vector data stored in the interactive package's native format into (and out of) the parallel objects. And the user has access to the functionality of PLAPACK, which currently includes factorizations (LU, QR, Cholesky), linear least squares, symmetric eigensolvers, and all BLAS.'","PeriodicalId":113978,"journal":{"name":"Proceedings of the IEEE/ACM SC98 Conference","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115777742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Automatically Tuned Linear Algebra Software 自动调谐线性代数软件
Proceedings of the IEEE/ACM SC98 Conference Pub Date : 1998-11-07 DOI: 10.1109/SC.1998.10004
R. C. Whaley, Jack J. Dongarra
{"title":"Automatically Tuned Linear Algebra Software","authors":"R. C. Whaley, Jack J. Dongarra","doi":"10.1109/SC.1998.10004","DOIUrl":"https://doi.org/10.1109/SC.1998.10004","url":null,"abstract":"This paper describes an approach for the automatic generation and optimization of numerical software for processors with deep memory hierarchies and pipelined functional units. The production of such software for machines ranging from desktop workstations to embedded processors can be a tedious and time consuming process. The work described here can help in automating much of this process. We will concentrate our efforts on the widely used linear algebra kernels called the Basic Linear Algebra Subroutines (BLAS). In particular, the work presented here is for general matrix multiply, DGEMM. However much of the technology and approach developed here can be applied to the other Level 3 BLAS and the general strategy can have an impact on basic linear algebra operations in general and may be extended to other important kernel operations.","PeriodicalId":113978,"journal":{"name":"Proceedings of the IEEE/ACM SC98 Conference","volume":"116 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131700166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1170
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信