ACM/IEEE SC 1999 Conference (SC'99)最新文献_第2页

Using the NREN Testbed to Prototype a High-Performance Multicast Application 利用NREN测试平台构建高性能组播应用原型

ACM/IEEE SC 1999 Conference (SC'99) Pub Date : 1900-01-01 DOI: 10.1145/331532.331539

Marjory J. Johnson, M. C. Spence, L. Chao

引用次数: 3

High Performance Computing with the Array Package for Java: A Case Study using Data Mining Java中使用Array包的高性能计算:使用数据挖掘的案例研究

ACM/IEEE SC 1999 Conference (SC'99) Pub Date : 1900-01-01 DOI: 10.1145/331532.331542

J. Moreira, S. Midkiff, M. Gupta, Richard D. Lawrence

{"title":"High Performance Computing with the Array Package for Java: A Case Study using Data Mining","authors":"J. Moreira, S. Midkiff, M. Gupta, Richard D. Lawrence","doi":"10.1145/331532.331542","DOIUrl":"https://doi.org/10.1145/331532.331542","url":null,"abstract":"This paper discusses several techniques used in developing a parallel, production quality data mining application in Java. We started by developing three sequential versions of a product recommendation data mining application: (i) a Fortran 90 version used as a performance reference, (ii) a plain Java implementation that only uses the primitive array structures from the language, and (iii) a baseline Java implementation that uses our Array package for Java. This Array package provides parallelism at the level of individual Array and BLAS operations. Using this Array package, we also developed two parallel Java versions of the data mining application: one that relies entirely on the implicit parallelism provided by the Array package, and another that is explicitly parallel at the application level. We discuss the design of the Array package, as well as the design of the data mining application. We compare the trade-offs between performance and the abstraction level the different Java versions present to the application programmer. Our studies show that, although a plain Java implementation performs poorly, the Java implementation with the Array package is quite competitive in performance with Fortran. We achieve a single processor performance of 109 Mflops, or 91% of Fortran performance, on a 332 MHz PowerPC 604e processor. Both the implicitly and explicitly parallel forms of our Java implementations also parallelize well. On an SMP with four of those PowerPC processors, the implicitly parallel form achieves 290 Mflops with no effort from the application programmer, while the explicitly parallel form achieves 340 Mflops.","PeriodicalId":354898,"journal":{"name":"ACM/IEEE SC 1999 Conference (SC'99)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127059235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

Improving Online Performance Diagnosis by the Use of Historical Performance Data 利用历史性能数据改进在线性能诊断

ACM/IEEE SC 1999 Conference (SC'99) Pub Date : 1900-01-01 DOI: 10.1145/331532.331574

K. Karavanic, B. Miller

引用次数: 34

Parallel Netwon-Krylov Methods for PDE-Constrained Optimization pde约束优化的并行网络- krylov方法

ACM/IEEE SC 1999 Conference (SC'99) Pub Date : 1900-01-01 DOI: 10.1145/331532.331560

G. Biros, O. Ghattas

{"title":"Parallel Netwon-Krylov Methods for PDE-Constrained Optimization","authors":"G. Biros, O. Ghattas","doi":"10.1145/331532.331560","DOIUrl":"https://doi.org/10.1145/331532.331560","url":null,"abstract":"Large scale optimization of systems governed by partial differential equations (PDEs) is a frontier problem in scientific computation. The state-of-the-art for solving such problems is reduced-space quasi-Newton sequential quadratic programming (SQP) methods. These take full advantage of existing PDE solver technology and parallelize well. However, their algorithmic scalability is questionable; for certain problem classes they can be very slow to converge. In this paper we propose a full-space Newton-Krylov SQP method that uses the reduced-space quasi-Newton method as a preconditioner. The new method is fully parallelizable; exploits the structure of and available parallel algorithms for the PDE forward problem; and is quadratically convergent close to a local minimum. We restrict our attention to boundary value problems and we solve a model optimal flow control problem, with both Stokes and Navier-Stokes equations as constraints. Algorithmic comparisons, scalability results, and parallel performance on a Cray T3E-900 are presented. On the model problems solved, the new method is a factor of 5-10 faster than reduced space quasi-Newton SQP, and is scalable provided a good forward preconditioner is available.","PeriodicalId":354898,"journal":{"name":"ACM/IEEE SC 1999 Conference (SC'99)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131602860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 26

Locality Optimizations for Multi-Level Caches 多级缓存的局部性优化

ACM/IEEE SC 1999 Conference (SC'99) Pub Date : 1900-01-01 DOI: 10.1145/331532.331534

Gabriel Rivera, C. Tseng

引用次数: 52

Mapping Irregular Applications to DIVA, a PIM-based Data-Intensive Architecture 将不规则应用程序映射到DIVA，一种基于pim的数据密集型体系结构

ACM/IEEE SC 1999 Conference (SC'99) Pub Date : 1900-01-01 DOI: 10.1145/331532.331589

Mary W. Hall, P. Kogge, J. Koller, P. Diniz, Jacqueline Chame, J. Draper, J. LaCoss, J. Granacki, J. Brockman, Apoorv Srivastava, W. Athas, V. Freeh, Jaewook Shin, Joonseok Park

{"title":"Mapping Irregular Applications to DIVA, a PIM-based Data-Intensive Architecture","authors":"Mary W. Hall, P. Kogge, J. Koller, P. Diniz, Jacqueline Chame, J. Draper, J. LaCoss, J. Granacki, J. Brockman, Apoorv Srivastava, W. Athas, V. Freeh, Jaewook Shin, Joonseok Park","doi":"10.1145/331532.331589","DOIUrl":"https://doi.org/10.1145/331532.331589","url":null,"abstract":"Processing-in-memory (PIM) chips that integrate processor logic into memory devices offer a new opportunity for bridging the growing gap between processor and memory speeds, especially for applications with high memory-bandwidth requirements. The Data-IntensiVe Architecture (DIVA) system combines PIM memories with one or more external host processors and a PIM-to-PIM interconnect. DIVA increases memory bandwidth through two mechanisms: (1) performing selected computation in memory, reducing the quantity of data transferred across the processor-memory interface; and (2) providing communication mechanisms called parcels for moving both data and computation throughout memory, further bypassing the processor-memory bus. DIVA uniquely supports acceleration of important irregular applications, including sparse-matrix and pointer-based computations. In this paper, we focus on several aspects of DIVA designed to effectively support such computations at very high performance levels: (1) the memory model and parcel definitions; (2) the PIM-to-PIM interconnect; and, (3) requirements for the processor-to-memory interface. We demonstrate the potential of PIM-based architectures in accelerating the performance of three irregular computations, sparse conjugate gradient, a natural-join database operation and an object-oriented database query.","PeriodicalId":354898,"journal":{"name":"ACM/IEEE SC 1999 Conference (SC'99)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125889881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 226

Data Organization and I/O in a Parallel Ocean Circulation Model 平行海洋环流模式的数据组织和I/O

ACM/IEEE SC 1999 Conference (SC'99) Pub Date : 1900-01-01 DOI: 10.1145/331532.331565

C. Ding, Yun He

引用次数: 3

Papyrus: A System for Data Mining over Local and Wide Area Clusters and Super-Clusters Papyrus:一个局部和广域集群和超级集群的数据挖掘系统

ACM/IEEE SC 1999 Conference (SC'99) Pub Date : 1900-01-01 DOI: 10.1145/331532.331595

Stuart Bailey, R. Grossman, H. Sivakumar, Andrei L. Turinsky

{"title":"Papyrus: A System for Data Mining over Local and Wide Area Clusters and Super-Clusters","authors":"Stuart Bailey, R. Grossman, H. Sivakumar, Andrei L. Turinsky","doi":"10.1145/331532.331595","DOIUrl":"https://doi.org/10.1145/331532.331595","url":null,"abstract":"Data mining is the semi-automatic discovery of patterns, correlations, changes, associations, and anomalies in large data sets. Traditionally, in a broad sense, statistics has focused on the assumption-driven analysis of data, while data mining has focused on the discovery-driven analysis of data. By discoverydriven, we mean the automatic search or semi-automatic search for interesting patterns and models. With the explosion of the commodity internet and the emergence of wide area high performance networks, mining distributed data is becoming recognized as a fundamental scientific challenge. In this paper, we introduce a system called Papyrus for distributed data mining over commodity and high performance networks and give some preliminary experimental results about its performance. We are particularly interested in data mining over clusters of workstations, distributed clusters connected by high performance networks (super-clusters), and distributed clusters and super-clusters connected by commodity networks (meta-clusters). As a motivating example taken from [7], consider the problem of searching for correlations between twenty five years of sunspot data archived on a server in Boulder and 80 years of Southern night marine air temperature data archived on a server in Maryland. The goal of this data mining query might be to understand whether sunspots are correlated with climatic shifts in temperature. Notice that","PeriodicalId":354898,"journal":{"name":"ACM/IEEE SC 1999 Conference (SC'99)","volume":"666 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132275745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 113

Multivariate Geographic Clustering in A Metacomputing Environment Using Globus 基于Globus的元计算环境下多元地理聚类

ACM/IEEE SC 1999 Conference (SC'99) Pub Date : 1900-01-01 DOI: 10.1145/331532.331537

G. Mahinthakumar, F. Hoffman, W. Hargrove, N. Karonis

{"title":"Multivariate Geographic Clustering in A Metacomputing Environment Using Globus","authors":"G. Mahinthakumar, F. Hoffman, W. Hargrove, N. Karonis","doi":"10.1145/331532.331537","DOIUrl":"https://doi.org/10.1145/331532.331537","url":null,"abstract":"The authors present a metacomputing application of multivariate, nonhierarchical statistical clustering to geographic environmental data from the 48 conterminous United States in order to produce maps of regions of ecological similarity, called ecoregions. These maps represent finer scale regionalizations than do those generated by the traditional technique: an expert with a marker pen. Several variables (e.g., temperature, organic matter, rainfall etc.) thought to affect the growth of vegetation are clustered at resolutions as fine as one square kilometer (1 km2). These data can represent over 7.8 million map cells in an n-dimensional (n = 9 to 25) data space. A parallel version of the iterative statistical clustering algorithm is developed by the authors using the MPI (Message Passing Interface) message passing routines. The parallel algorithm uses a classical, self-scheduling, single-program, multiple data (SPMD) organization; performs dynamic load balancing for reasonable performance in heterogeneous metacomputing environments; and provides fault tolerance by saving intermediate results for easy restarts in case of hardware failure. The parallel algorithm was tested on various geographically distributed heterogeneous metacomputing configurations involving an IBM SP3TM, an IBM SP2TM, and two SGI Origin 2000TM ’s. The tests were performed with minimal code modification, and were made possible by GlobusTM (a metacomputing software toolkit) and the Globus-enabled version of MPI (MPICH-G). Our performance tests indicate that while the algorithm works reasonably well under the metacomputing environment for a moderate number of processors, the communication overhead can become prohibitive for large processor configurations.","PeriodicalId":354898,"journal":{"name":"ACM/IEEE SC 1999 Conference (SC'99)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127022808","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 31

Parallel Multigrid Solver for 3D Unstructured Finite Element Problems 三维非结构有限元问题的并行多网格求解器

ACM/IEEE SC 1999 Conference (SC'99) Pub Date : 1900-01-01 DOI: 10.1145/331532.331559

M. Adams, J. Demmel

引用次数: 28