{"title":"Efficient Execution of Multiple Query Workloads in Data Analysis Applications","authors":"H. Andrade, T. Kurç, A. Sussman, J. Saltz","doi":"10.1145/582034.582087","DOIUrl":"https://doi.org/10.1145/582034.582087","url":null,"abstract":"Applications that analyze, mine, and visualize large datasets are considered an important class of applications in many areas of science, engineering, and business. Queries commonly executed in data analysis applications often involve user-defined processing of data and application-specific data structures. If data analysis is employed in a collaborative environment, the data server should execute multiple such queries simultaneously to minimize the response time to clients. In this paper we present the design of a runtime system for executing multiple query workloads on a shared-memory machine. We describe experimental results using an application for browsing digitized microscopy images.","PeriodicalId":325282,"journal":{"name":"ACM/IEEE SC 2001 Conference (SC'01)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124378188","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Solution of a Three-Body Problem in Quantum Mechanics Using Sparse Linear Algebra on Parallel Computers","authors":"M. Baertschy, X. Li","doi":"10.1145/582034.582081","DOIUrl":"https://doi.org/10.1145/582034.582081","url":null,"abstract":"A complete description of two outgoing electrons following an ionizing collision between a single electron and an atom or molecule has long stood as one of the unsolved fundamental problems in quantum collision theory. In this paper we describe our use of distributed memory parallel computers to calculate a fully converged wave function describing the electron-impact ionization of hydrogen. Our approach hinges on a transformation of the Schrödinger equation that simplifies the boundary conditions but requires solving very ill-conditioned systems of a few million complex, sparse linear equations. We developed a two-level iterative algorithm that requires repeated solution of sets of a few hundred thousand linear equations. These are solved directly by LU-factorization using a specially tuned, distributed memory parallel version of the sparse LU-factorization library Super-LU. In smaller cases, where direct solution is technically possible, our iterative algorithm still gives significant savings in time and memory despite lower megaflop rates.","PeriodicalId":325282,"journal":{"name":"ACM/IEEE SC 2001 Conference (SC'01)","volume":"92 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131836659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Automatic Design Optimization Tool and its Application to Computational Fluid Dynamics","authors":"D. Abramson, A. Lewis, T. Peachey, C. Fletcher","doi":"10.1145/582034.582059","DOIUrl":"https://doi.org/10.1145/582034.582059","url":null,"abstract":"In this paper we describe the Nimrod/O design optimization tool, and its application in computational fluid dynamics. Nimrod/O facilitates the use of an arbitrary computational model to drive an automatic optimization process. This means that the user can parameterise an arbitrary problem, and then ask the tool to compute the parameter values that minimize or maximise a design objective function. The paper describes the Nimrod/O system, and then discusses a case study in the evaluation of an aerofoil problem. The problem involves computing the shape and angle of attack of the aerofoil that maximises the lift to drag ratio. The results show that our general approach is extremely flexible and delivers better results than a program that was developed specifically for the problem. Moreover, it only took us a few hours to set up the tool for the new problem and required no software development.","PeriodicalId":325282,"journal":{"name":"ACM/IEEE SC 2001 Conference (SC'01)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114625113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"LegionFS: A Secure and Scalable File System Supporting Cross-Domain High-Performance Applications","authors":"B. White, M. Walker, M. Humphrey, A. Grimshaw","doi":"10.1145/582034.582093","DOIUrl":"https://doi.org/10.1145/582034.582093","url":null,"abstract":"Realizing that current file systems can not cope with the diverse requirements of wide-area collaborations, researchers have developed data access facilities to meet their needs. Recent work has focused on comprehensive data access architectures. In order to fulfill the evolving requirements in this environment, we suggest a more fully-integrated architecture built upon the fundamental tenets of naming, security, scalability, extensibility, and adaptability. These form the underpinning of the Legion File System (LegionFS). This paper motivates the need for these requirements and presents benchmarks that highlight the scalability of LegionFS. LegionFS aggregate throughput follows the linear growth of the network, yielding an aggregate read bandwidth of 193.8 MB/s on a 100 Mbps Ethernet backplane with 50 simultaneous readers. The serverless architecture of LegionFS is shown to benefit important scientific applications, such as those accessing the Protein Data Bank, within both local- and wide-area environments.","PeriodicalId":325282,"journal":{"name":"ACM/IEEE SC 2001 Conference (SC'01)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117271603","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Terascale Spectral Element Dynamical Core for Atmospheric General Circulation Models","authors":"R. Loft, Stephen J. Thomas, J. Dennis","doi":"10.1145/582034.582052","DOIUrl":"https://doi.org/10.1145/582034.582052","url":null,"abstract":"Climate modeling is a grand challenge problem where scientific progress is measured not in terms of the largest problem that can be solved but by the highest achievable integration rate. These models have been notably absent in previous Gordon Bell competitions due to their inability to scale to large processor counts. A scalable and efficient spectral element atmospheric model is presented. A new semi-implicit time stepping scheme accelerates the integration rate relative to an explicit model by a factor of two, achieving 130 years per day at T63L30 equivalent resolution. Execution rates are reported for the standard shallow water and Held-Suarez climate benchmarks on IBM SP clusters. The explicit T170 equivalent multi-layer shallow water model sustains 343 Gflops at NERSC, 206 Gflops at NPACI (SDSC) and 127 Gflops at NCAR. An explicit Held-Suarez integration sustains 369 Gflops on 128 16-way IBM nodes at NERSC.","PeriodicalId":325282,"journal":{"name":"ACM/IEEE SC 2001 Conference (SC'01)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124067766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
O. Tatebe, U. Nagashima, S. Sekiguchi, Hisayoshi Kitabayashi, Y. Hayashida
{"title":"Design and Implementation of FMPL, a Fast Message-Passing Library for Remote Memory Operations","authors":"O. Tatebe, U. Nagashima, S. Sekiguchi, Hisayoshi Kitabayashi, Y. Hayashida","doi":"10.1145/582034.582049","DOIUrl":"https://doi.org/10.1145/582034.582049","url":null,"abstract":"A fast message-passing library FMPL has been designed and developed to maximize communication performance by utilizing general architectural communication support such as remote memory operations, as well as to maximize total performance by eliminating dynamic communication overhead and overlapping communication and computation. FMPL provides a low-cost general-purpose point-to-point communication and collective communication such as broadcast, barrier synchronization and reduction. On a Hitachi SR8000, FMPL achieves an 8-byte latency of 12.8µsec., while MPI achieves 20µsec. FMPL is designed for building more highly functional message-passing libraries like BLACS as well as applications that need maximum performance.","PeriodicalId":325282,"journal":{"name":"ACM/IEEE SC 2001 Conference (SC'01)","volume":"91 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130656824","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Compressing Inverted Files in Scalable Information Systems by Binary Decision Diagram Encoding","authors":"Chung-Hung Lai, Tien-Fu Chen","doi":"10.1145/582034.582094","DOIUrl":"https://doi.org/10.1145/582034.582094","url":null,"abstract":"One of the key challenges of managing very huge volumes of data in scalable Information retrieval systems is providing fast access through keyword searches. The major data structure in the information retrieval system is an inverted file, which records the positions of each term in the documents. When the information set substantially grows, the number of terms and documents are significantly increased as well as the size of the inverted files. Approaches to reduce the inverted file without sacri.cing the query efficiency are important to the success of scalable information systems. In this paper, we propose a compression approach by using Binary Decision Diagram Encoding (BDD) so that all possible ordering correlation among large amount of documents will be extracted to minimize the posting representation. Another advantage of using BDD is that BDD expressions can e.ciently perform Boolean queries, which are very common in retrieval systems. Experiment results show that the compression ratios of the inverted files have been improved signi.cantly by the BDD scheme.","PeriodicalId":325282,"journal":{"name":"ACM/IEEE SC 2001 Conference (SC'01)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130910315","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gabrielle Allen, Thomas Dramlitsch, Ian T Foster, N. Karonis, M. Ripeanu, E. Seidel, B. Toonen
{"title":"Supporting Efficient Execution in Heterogeneous Distributed Computing Environments with Cactus and Globus","authors":"Gabrielle Allen, Thomas Dramlitsch, Ian T Foster, N. Karonis, M. Ripeanu, E. Seidel, B. Toonen","doi":"10.1145/582034.582086","DOIUrl":"https://doi.org/10.1145/582034.582086","url":null,"abstract":"Improvements in the performance of processors and networks make it both feasible and interesting to treat collections of workstations, servers, clusters, and supercomputers as integrated computational resources, or Grids. However, the highly heterogeneous and dynamic nature of such Grids can make application development di.cult. Here we describe an architecture and prototype implementation for a Grid-enabled computational framework based on Cactus, the MPICH-G2 Grid-enabled message-passing library, and a variety of specialized features to support e.cient execution in Grid environments. We have used this framework to perform record-setting computations in numerical relativity, running across four supercomputers and achieving scaling of 88% (1140 CPU’s) and 63% (1500 CPUs). The problem size we were able to compute was about five times larger than any other previous run. Further, we introduce and demonstrate adaptive methods that automatically adjust computational parameters during run time, to increase dramatically the efficiency of a distributed Grid simulation, without modification of the application and without any knowledge of the underlying network connecting the distributed computers.","PeriodicalId":325282,"journal":{"name":"ACM/IEEE SC 2001 Conference (SC'01)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116556869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Stable, Globally Non-Iterative, Non-Overlapping Domain Decomposition Parallel Solvers for Parabolic Problems","authors":"Y. Zhuang, Xian-He Sun","doi":"10.1145/582034.582053","DOIUrl":"https://doi.org/10.1145/582034.582053","url":null,"abstract":"In this paper, we report a class of stabilized explicit-implicit domain decomposition (SEIDD) methods for the parallel solution of parabolic problems, based on the explicit-implicit domain decomposition (EIDD) methods. EIDD methods are globally non-iterative, non-overlapping domain decomposition methods which, when compared with Schwarz alternating algorithm based parabolic solvers, are computationally and communicationally efficient for each simulation time step but suffer from time step size restrictions due to conditional stability or conditional consistency. By adding a stabilization step to the EIDD methods, the SEIDD methods are freed from time step size restrictions while retaining EIDD’s computational and communicational efficiency for each time step, rendering themselves excellent candidates for large-scale parallel simulations. Three algorithms of the SEIDD type are implemented, which are experimentally tested to show excellent stability, computation and communication efficiencies, and high parallel speedup and scalability.","PeriodicalId":325282,"journal":{"name":"ACM/IEEE SC 2001 Conference (SC'01)","volume":"291 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113995356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Modeling of Seismic Wave Propagation at the Scale of the Earth on a Large Beowulf","authors":"D. Komatitsch, J. Tromp","doi":"10.1145/582034.582076","DOIUrl":"https://doi.org/10.1145/582034.582076","url":null,"abstract":"We use a parallel spectral-element method to simulate the propagation of seismic waves generated by earthquakes in the entire 3-D Earth. The method is implemented using MPI on a large PC cluster (Beowulf) with 151 processors and 76 Gb of RAM. It is based upon a weak formulation of the equations of motion and combines the flexibility of a finite-element method with the accuracy of a pseudospectral method. The finite-element mesh honors all discontinuities in the Earth velocity model. To maintain a relatively constant number of grid points per seismic wavelength, the size of the elements is increased with depth in a conforming fashion, thus retaining a diagonal mass matrix. The effects of attenuation and anisotropy are incorporated. We benchmark spectral-element synthetic seismograms against a normal-mode reference solution for a spherically symmetric Earth velocity model. The two methods are in excellent agreement for all waves with periods greater than 20 seconds.","PeriodicalId":325282,"journal":{"name":"ACM/IEEE SC 2001 Conference (SC'01)","volume":"118 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115407633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}