ACM/IEEE SC 1999 Conference (SC'99)最新文献

筛选
英文 中文
A Programmable Preprocessor for Parallelizing Fortran-90 并行化Fortran-90的可编程预处理器
ACM/IEEE SC 1999 Conference (SC'99) Pub Date : 1900-01-01 DOI: 10.1145/331532.331535
M. Rosing, Steve Yabusaki
{"title":"A Programmable Preprocessor for Parallelizing Fortran-90","authors":"M. Rosing, Steve Yabusaki","doi":"10.1145/331532.331535","DOIUrl":"https://doi.org/10.1145/331532.331535","url":null,"abstract":"A programmable preprocessor that generates portable and efficient parallel Fortran-90 code has been successfully used in the development of a variety of environmental transport simulators for the Department of Energy. The tool provides the basic functionality of a traditional preprocessor where directives are embedded in a serial Fortran program and interpreted by the preprocessor to produce parallel Fortran code with MPI calls. The unique aspect of this work is that the user can make additions to, or modify, these directives. The directives reside in a preprocessor library and changes to this library can range from small changes to customize an existing library, to larger changes for porting a library, to completely replacing the library. The preprocessor is programmed with a library of directives written in a C-like language, called DL, that has added support for manipulating Fortran code fragments. The primary benefits to the user are twofold: It is fairly easy for any user to generate efficient, parallel code from Fortran-90 with embedded directives, and the long term viability of the user’s software is guaranteed. This is because the source code will always run on a serial machine (the directives are transparent to standard Fortran compilers), and the preprocessor library can be modified to work with different hardware and software environments. A 4000 line preprocessor library has been written and used to parallelize roughly 50,000 lines of groundwater modeling code. The programs have been ported to a wide range of parallel architectures. Performance of these programs is similar to programs explicitly written for a parallel machine. Binaries of the preprocessor core, as well as the preprocessor library source code used in our groundwater modeling codes are currently available.","PeriodicalId":354898,"journal":{"name":"ACM/IEEE SC 1999 Conference (SC'99)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128596703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Parallelization of Radiance For Real Time Interactive Lighting Visualization Walkthroughs 并行辐射的实时交互式照明可视化演练
ACM/IEEE SC 1999 Conference (SC'99) Pub Date : 1900-01-01 DOI: 10.1145/331532.331593
D. Robertson, K. Campbell, Stephen Lau, T. Ligocki
{"title":"Parallelization of Radiance For Real Time Interactive Lighting Visualization Walkthroughs","authors":"D. Robertson, K. Campbell, Stephen Lau, T. Ligocki","doi":"10.1145/331532.331593","DOIUrl":"https://doi.org/10.1145/331532.331593","url":null,"abstract":"Radiance is a software package developed at Lawrence Berkeley National Laboratory for lighting visualization. Lighting visualization predicts how lighting would appear if a modelled scene were to be physically realized. Unlike most lighting systems, Radiance physically models the effects of lighting, providing an image that is closer to physical reality. This is of obvious benefit to architects and lighting designers. Such visualizations are computationally expensive: rendering a single image can take hours on a standard workstation environment. Ideally, an architect would like to be able to interactively navigate through a scene to get a full impression of the true appearance of a particular model. With this goal in mind, we have (1) developed a geometric-based method (point cloud) to reuse pixels from a previous frame and (2) developed a parallel, distributed memory implementation of Radiance and the point cloud using MPI for inter-processor communication.","PeriodicalId":354898,"journal":{"name":"ACM/IEEE SC 1999 Conference (SC'99)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126894052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Adaptive, Multiresolution Visualization of Large Data Sets using a Distributed Memory Octree 使用分布式内存八叉树的自适应多分辨率大数据集可视化
ACM/IEEE SC 1999 Conference (SC'99) Pub Date : 1900-01-01 DOI: 10.1145/331532.331592
L. Diachin, R. Loy
{"title":"Adaptive, Multiresolution Visualization of Large Data Sets using a Distributed Memory Octree","authors":"L. Diachin, R. Loy","doi":"10.1145/331532.331592","DOIUrl":"https://doi.org/10.1145/331532.331592","url":null,"abstract":"The interactive visualization and exploration of large scientific data sets is a challenging and difficult task; their size often far exceeds the performance and memory capacity of even the most powerful graphics workstations. To address this problem, we have created a technique that combines hierarchical data reduction methods with parallel computing to allow interactive exploration of large data sets while retaining full-resolution capability. The user may interactively change the resolution of the reduced data set either globally or by specifying a region of interest. In this way, high resolution can be obtained in local subregions without sacrificing graphics performance. We describe the software architecture of the system, give details pertaining to the use of a distributed memory octree used to create the reduced data set, and present performance results for the visualization of Rayleigh-Taylor instability and x-ray burst simulation data sets.","PeriodicalId":354898,"journal":{"name":"ACM/IEEE SC 1999 Conference (SC'99)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124859664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 44
A Parallel Implementation of the TOUGH2 Software Package for Large Scale Multiphase Fluid and Heat Flow Simulations 大规模多相流体和热流模拟的TOUGH2软件包并行实现
ACM/IEEE SC 1999 Conference (SC'99) Pub Date : 1900-01-01 DOI: 10.1145/331532.331584
E. Elmroth, C. Ding, Yu-Shu Wu, K. Pruess
{"title":"A Parallel Implementation of the TOUGH2 Software Package for Large Scale Multiphase Fluid and Heat Flow Simulations","authors":"E. Elmroth, C. Ding, Yu-Shu Wu, K. Pruess","doi":"10.1145/331532.331584","DOIUrl":"https://doi.org/10.1145/331532.331584","url":null,"abstract":"TOUGH2 is a widely used simulation package for solving groundwater flow related problems such as nuclear waste isolation, environmental remediation, and geothermal reservoir engineering. It solves a set of coupled mass and energy balance equations using a finite volume method. The parallel implementation first partitions the unstructured computational domain. For each time step, a set of coupled non-linear equations is solved with Newton iteration. In each Newton step, a Jacobian matrix is calculated and an ill-conditioned non-symmetric linear system is solved in parallel using a preconditioned iterative solver. Communication is required for convergence tests and data exchange across partitioning borders. A real problem with 17,584 blocks and 43,815 connections indicates good scalability properties. From 2 to 128 processors on Cray T3E, the solution time is reduced from 7984 to 126 seconds. Improved parallel performance is expected for larger problems with 105-106 blocks in a Yucca Mountain nuclear waste site study.","PeriodicalId":354898,"journal":{"name":"ACM/IEEE SC 1999 Conference (SC'99)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122773204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
A New Switch Chip for IBM RS/6000 SP Systems 一种用于IBM RS/6000 SP系统的新型开关芯片
ACM/IEEE SC 1999 Conference (SC'99) Pub Date : 1900-01-01 DOI: 10.1145/331532.331548
C. Stunkel, Jay Herring, B. Abali, Rajeev Sivaram
{"title":"A New Switch Chip for IBM RS/6000 SP Systems","authors":"C. Stunkel, Jay Herring, B. Abali, Rajeev Sivaram","doi":"10.1145/331532.331548","DOIUrl":"https://doi.org/10.1145/331532.331548","url":null,"abstract":"This paper describes the architecture of a third-generation switching element which may appear in future IBM RS/6000 SP interconnection networks. In this paper this ASIC will be referred as the Switch3 switch chip. Like its predecessors, Switch3 is an 8-port device implementing output-queuing using the high-utilization central-buffering technique. However, Switch3 offers significant enhancements over these existing SP switch chips by incorporating advances in both VLSI technology and in recent interconnection network research. Switch3 introduces a new form of adaptive routing with the potential to significantly improve network bandwidth. It also offers support for collective communication via a powerful hardware multicast replication capability. The technology advances allow link bandwidth to be improved to 500 MB/s per direction per link, and allow the central buffer size to be doubled compared to the current SP switch. Furthermore, the larger Switch3 input buffers are capable of supporting link lengths of up to 100 meters, enabling richly-connected, scalable topologies with a high aggregate bandwidth. Finally, Switch3 offers a number of other significant enhancements including limited support for high-priority traffic and detailed performance monitoring information.","PeriodicalId":354898,"journal":{"name":"ACM/IEEE SC 1999 Conference (SC'99)","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114154272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 31
Industrial Seismic Imaging on Commodity Supercomputers 商用超级计算机上的工业地震成像
ACM/IEEE SC 1999 Conference (SC'99) Pub Date : 1900-01-01 DOI: 10.1145/331532.331543
S. Morton, Jeffrey R. Davis, Harry L. Duffey, Gary L. Donathan, Vic Forsyth, S. Checkles
{"title":"Industrial Seismic Imaging on Commodity Supercomputers","authors":"S. Morton, Jeffrey R. Davis, Harry L. Duffey, Gary L. Donathan, Vic Forsyth, S. Checkles","doi":"10.1145/331532.331543","DOIUrl":"https://doi.org/10.1145/331532.331543","url":null,"abstract":"The petroleum industry has been a strong consumer of traditional supercomputers, routinely running technical applications which process terabytes of data and require Gflops-years of computation. Now that comparable parallel supercomputers can be assembled using commodity components for about one-tenth the price of a traditional supercomputer, it is inevitable that the petroleum industry will adopt commodity systems. In this paper, we will first give an overview of industrial seismic imaging for petroleum exploration, including the physics and computational approach of the predominant algorithm. We then will describe our decision-making process for and experience in constructing a commodity supercomputer. This machine has been performing production seismic imaging since January 1, 1999.","PeriodicalId":354898,"journal":{"name":"ACM/IEEE SC 1999 Conference (SC'99)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132271944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Unifying Data Structure for Hierarchical Methods 分层方法的统一数据结构
ACM/IEEE SC 1999 Conference (SC'99) Pub Date : 1900-01-01 DOI: 10.1145/331532.331556
F. E. Sevilgen, S. Aluru
{"title":"A Unifying Data Structure for Hierarchical Methods","authors":"F. E. Sevilgen, S. Aluru","doi":"10.1145/331532.331556","DOIUrl":"https://doi.org/10.1145/331532.331556","url":null,"abstract":"We present a data structure for supporting the access patterns required by most scientific applications that employ hierarchical methods. The data structure, termed the Distribution Independent Adaptive Tree, efficiently supports both grid-based and particle-based methods. We present efficient algorithms for most access patterns encountered in such applications: particle insertion/deletion/splitting, grid cell insertion/deletion, nearest neighbor queries, spherical region queries and computing long-range interactions. Apart from being an efficient data structure for an individual hierarchical method, the data structure is useful in applications that involve simultaneous application of multiple methods.","PeriodicalId":354898,"journal":{"name":"ACM/IEEE SC 1999 Conference (SC'99)","volume":"378 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133348005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Integrated Manufacturing and Development (IMaD) 集成制造与开发(IMaD)
ACM/IEEE SC 1999 Conference (SC'99) Pub Date : 1900-01-01 DOI: 10.1145/331532.331541
D. Moran, G. Ditlow, Daria R. Dooling, Ralph Williams, Tom Wilkins
{"title":"Integrated Manufacturing and Development (IMaD)","authors":"D. Moran, G. Ditlow, Daria R. Dooling, Ralph Williams, Tom Wilkins","doi":"10.1145/331532.331541","DOIUrl":"https://doi.org/10.1145/331532.331541","url":null,"abstract":"This article describes an IBM-developed parallel software methodology called Integrated Manufacturing and Development (IMaD). IMaD is a super-scalable AIX application written in the C programming language and makes use of the Message Passing Interface (MPI). It is built specifically to support Product Engineering (PE) for full-scale integrated circuits (IC), like microprocessors, and encompasses failure, yield, and reliability design and analysis. Because an IC is constructed of smaller building-block circuits, many aspects of its behavior are learned from a local analysis of some subset of the IC. However, there are global aspects of an IC-such as its power-grid electrical distribution that require a full-scale formulation, solution, evaluation, and visualization. This sets the stage for some very large problems that demand enormous computational and memory resources. In answer to these demands, IMaD provides a parallel processing solution that incorporates novel methods of both topographic partitioning of the IC and of solving the global electrical simulation equations.","PeriodicalId":354898,"journal":{"name":"ACM/IEEE SC 1999 Conference (SC'99)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116781755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cache-Optimal Methods for Bit-Reversals 位反转的缓存最优方法
ACM/IEEE SC 1999 Conference (SC'99) Pub Date : 1900-01-01 DOI: 10.1145/331532.331558
Zhao Zhang, Xiaodong Zhang
{"title":"Cache-Optimal Methods for Bit-Reversals","authors":"Zhao Zhang, Xiaodong Zhang","doi":"10.1145/331532.331558","DOIUrl":"https://doi.org/10.1145/331532.331558","url":null,"abstract":"Bit-reversals are representative and important data reordering operations in many scientific computations. Performance degradation is mainly caused by cache conflict misses. Bit-reversals are often repeatedly used as fundamental subroutines for many scientific programs. Thus, in order to gain the best performance, cache-optimal methods and their implementations should be carefully and precisely done at the programming level. This type of performance programming for some special programs, such as the data reorderings, may significantly outperform an optimization from an automatic tool, such as a compiler. In this paper, we examine different methods using techniques of blocking, buffering, and padding for efficient implementations. We evaluate the merits and limits of each technique and their application and architecture-dependent conditions for developing cache-optimal methods. We present two contributions in this paper: (1) Our integrated blocking methods, which match cache associativity and TLB cache size and which fully use the available registers are cache-optimal and fast. (2) We show that our padding methods outperform other software oriented methods, and believe they are the fastest in terms of minimizing both CPU and memory access cycles. Since the padding methods are almost independent of hardware, they could be widely used on many uniprocessor workstations and SMP multiprocessors.","PeriodicalId":354898,"journal":{"name":"ACM/IEEE SC 1999 Conference (SC'99)","volume":"107 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121818032","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
MPI and Java-MPI: Contrasts and Comparisons of Low-Level Communication Performance MPI和Java-MPI:低级通信性能的对比和比较
ACM/IEEE SC 1999 Conference (SC'99) Pub Date : 1900-01-01 DOI: 10.1145/331532.331553
V. Getov, Paul A. Gray, V. Sunderam
{"title":"MPI and Java-MPI: Contrasts and Comparisons of Low-Level Communication Performance","authors":"V. Getov, Paul A. Gray, V. Sunderam","doi":"10.1145/331532.331553","DOIUrl":"https://doi.org/10.1145/331532.331553","url":null,"abstract":"Java is receiving increasing attention as the most popular platform for distributed and collaborative computing. However, it is still subject to significant performance drawbacks in comparison to other programming languages such as C and Fortran. This paper represents the current status of our ongoing project which intends to conduct a detailed experimental evaluation on the suitability of Java in these environments, with particular focus on its message-passing performance for one-to-one as well as one-to-many and many-to- many data exchange patterns. We also emphasize both methodology and evaluation guidelines in order to ensure reproducibility, sound interpretation, and comparative analysis of performance results. Some of the important parameters which characterize the communication performance of MPI and Java-MPI such as latency, asymptotic bandwidth and N-half are investigated. In addition, we introduce two different types of pipeline effects - intra-message and inter-message - that have significant influence on the message-passing performance. For this purpose we have developed a low-level message-passing benchmark suite, which we have used to evaluate and compare different message-passing environments on the IBM SP-2.","PeriodicalId":354898,"journal":{"name":"ACM/IEEE SC 1999 Conference (SC'99)","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124850241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 32
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信