ACM/IEEE SC 1999 Conference (SC'99)最新文献_第3页

A Programmable Preprocessor for Parallelizing Fortran-90 并行化Fortran-90的可编程预处理器

ACM/IEEE SC 1999 Conference (SC'99) Pub Date : 1900-01-01 DOI: 10.1145/331532.331535

M. Rosing, Steve Yabusaki

{"title":"A Programmable Preprocessor for Parallelizing Fortran-90","authors":"M. Rosing, Steve Yabusaki","doi":"10.1145/331532.331535","DOIUrl":"https://doi.org/10.1145/331532.331535","url":null,"abstract":"A programmable preprocessor that generates portable and efficient parallel Fortran-90 code has been successfully used in the development of a variety of environmental transport simulators for the Department of Energy. The tool provides the basic functionality of a traditional preprocessor where directives are embedded in a serial Fortran program and interpreted by the preprocessor to produce parallel Fortran code with MPI calls. The unique aspect of this work is that the user can make additions to, or modify, these directives. The directives reside in a preprocessor library and changes to this library can range from small changes to customize an existing library, to larger changes for porting a library, to completely replacing the library. The preprocessor is programmed with a library of directives written in a C-like language, called DL, that has added support for manipulating Fortran code fragments. The primary benefits to the user are twofold: It is fairly easy for any user to generate efficient, parallel code from Fortran-90 with embedded directives, and the long term viability of the user’s software is guaranteed. This is because the source code will always run on a serial machine (the directives are transparent to standard Fortran compilers), and the preprocessor library can be modified to work with different hardware and software environments. A 4000 line preprocessor library has been written and used to parallelize roughly 50,000 lines of groundwater modeling code. The programs have been ported to a wide range of parallel architectures. Performance of these programs is similar to programs explicitly written for a parallel machine. Binaries of the preprocessor core, as well as the preprocessor library source code used in our groundwater modeling codes are currently available.","PeriodicalId":354898,"journal":{"name":"ACM/IEEE SC 1999 Conference (SC'99)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128596703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Parallelization of Radiance For Real Time Interactive Lighting Visualization Walkthroughs 并行辐射的实时交互式照明可视化演练

ACM/IEEE SC 1999 Conference (SC'99) Pub Date : 1900-01-01 DOI: 10.1145/331532.331593

D. Robertson, K. Campbell, Stephen Lau, T. Ligocki

引用次数: 16

Adaptive, Multiresolution Visualization of Large Data Sets using a Distributed Memory Octree 使用分布式内存八叉树的自适应多分辨率大数据集可视化

ACM/IEEE SC 1999 Conference (SC'99) Pub Date : 1900-01-01 DOI: 10.1145/331532.331592

L. Diachin, R. Loy

引用次数: 44

A Parallel Implementation of the TOUGH2 Software Package for Large Scale Multiphase Fluid and Heat Flow Simulations 大规模多相流体和热流模拟的TOUGH2软件包并行实现

ACM/IEEE SC 1999 Conference (SC'99) Pub Date : 1900-01-01 DOI: 10.1145/331532.331584

E. Elmroth, C. Ding, Yu-Shu Wu, K. Pruess

引用次数: 7

A New Switch Chip for IBM RS/6000 SP Systems 一种用于IBM RS/6000 SP系统的新型开关芯片

ACM/IEEE SC 1999 Conference (SC'99) Pub Date : 1900-01-01 DOI: 10.1145/331532.331548

C. Stunkel, Jay Herring, B. Abali, Rajeev Sivaram

{"title":"A New Switch Chip for IBM RS/6000 SP Systems","authors":"C. Stunkel, Jay Herring, B. Abali, Rajeev Sivaram","doi":"10.1145/331532.331548","DOIUrl":"https://doi.org/10.1145/331532.331548","url":null,"abstract":"This paper describes the architecture of a third-generation switching element which may appear in future IBM RS/6000 SP interconnection networks. In this paper this ASIC will be referred as the Switch3 switch chip. Like its predecessors, Switch3 is an 8-port device implementing output-queuing using the high-utilization central-buffering technique. However, Switch3 offers significant enhancements over these existing SP switch chips by incorporating advances in both VLSI technology and in recent interconnection network research. Switch3 introduces a new form of adaptive routing with the potential to significantly improve network bandwidth. It also offers support for collective communication via a powerful hardware multicast replication capability. The technology advances allow link bandwidth to be improved to 500 MB/s per direction per link, and allow the central buffer size to be doubled compared to the current SP switch. Furthermore, the larger Switch3 input buffers are capable of supporting link lengths of up to 100 meters, enabling richly-connected, scalable topologies with a high aggregate bandwidth. Finally, Switch3 offers a number of other significant enhancements including limited support for high-priority traffic and detailed performance monitoring information.","PeriodicalId":354898,"journal":{"name":"ACM/IEEE SC 1999 Conference (SC'99)","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114154272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 31

Industrial Seismic Imaging on Commodity Supercomputers 商用超级计算机上的工业地震成像

ACM/IEEE SC 1999 Conference (SC'99) Pub Date : 1900-01-01 DOI: 10.1145/331532.331543

S. Morton, Jeffrey R. Davis, Harry L. Duffey, Gary L. Donathan, Vic Forsyth, S. Checkles

引用次数: 1

A Unifying Data Structure for Hierarchical Methods 分层方法的统一数据结构

ACM/IEEE SC 1999 Conference (SC'99) Pub Date : 1900-01-01 DOI: 10.1145/331532.331556

F. E. Sevilgen, S. Aluru

引用次数: 8

Integrated Manufacturing and Development (IMaD) 集成制造与开发(IMaD)

ACM/IEEE SC 1999 Conference (SC'99) Pub Date : 1900-01-01 DOI: 10.1145/331532.331541

D. Moran, G. Ditlow, Daria R. Dooling, Ralph Williams, Tom Wilkins

引用次数: 0

Cache-Optimal Methods for Bit-Reversals 位反转的缓存最优方法

ACM/IEEE SC 1999 Conference (SC'99) Pub Date : 1900-01-01 DOI: 10.1145/331532.331558

Zhao Zhang, Xiaodong Zhang

{"title":"Cache-Optimal Methods for Bit-Reversals","authors":"Zhao Zhang, Xiaodong Zhang","doi":"10.1145/331532.331558","DOIUrl":"https://doi.org/10.1145/331532.331558","url":null,"abstract":"Bit-reversals are representative and important data reordering operations in many scientific computations. Performance degradation is mainly caused by cache conflict misses. Bit-reversals are often repeatedly used as fundamental subroutines for many scientific programs. Thus, in order to gain the best performance, cache-optimal methods and their implementations should be carefully and precisely done at the programming level. This type of performance programming for some special programs, such as the data reorderings, may significantly outperform an optimization from an automatic tool, such as a compiler. In this paper, we examine different methods using techniques of blocking, buffering, and padding for efficient implementations. We evaluate the merits and limits of each technique and their application and architecture-dependent conditions for developing cache-optimal methods. We present two contributions in this paper: (1) Our integrated blocking methods, which match cache associativity and TLB cache size and which fully use the available registers are cache-optimal and fast. (2) We show that our padding methods outperform other software oriented methods, and believe they are the fastest in terms of minimizing both CPU and memory access cycles. Since the padding methods are almost independent of hardware, they could be widely used on many uniprocessor workstations and SMP multiprocessors.","PeriodicalId":354898,"journal":{"name":"ACM/IEEE SC 1999 Conference (SC'99)","volume":"107 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121818032","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

MPI and Java-MPI: Contrasts and Comparisons of Low-Level Communication Performance MPI和Java-MPI:低级通信性能的对比和比较

ACM/IEEE SC 1999 Conference (SC'99) Pub Date : 1900-01-01 DOI: 10.1145/331532.331553

V. Getov, Paul A. Gray, V. Sunderam

{"title":"MPI and Java-MPI: Contrasts and Comparisons of Low-Level Communication Performance","authors":"V. Getov, Paul A. Gray, V. Sunderam","doi":"10.1145/331532.331553","DOIUrl":"https://doi.org/10.1145/331532.331553","url":null,"abstract":"Java is receiving increasing attention as the most popular platform for distributed and collaborative computing. However, it is still subject to significant performance drawbacks in comparison to other programming languages such as C and Fortran. This paper represents the current status of our ongoing project which intends to conduct a detailed experimental evaluation on the suitability of Java in these environments, with particular focus on its message-passing performance for one-to-one as well as one-to-many and many-to- many data exchange patterns. We also emphasize both methodology and evaluation guidelines in order to ensure reproducibility, sound interpretation, and comparative analysis of performance results. Some of the important parameters which characterize the communication performance of MPI and Java-MPI such as latency, asymptotic bandwidth and N-half are investigated. In addition, we introduce two different types of pipeline effects - intra-message and inter-message - that have significant influence on the message-passing performance. For this purpose we have developed a low-level message-passing benchmark suite, which we have used to evaluate and compare different message-passing environments on the IBM SP-2.","PeriodicalId":354898,"journal":{"name":"ACM/IEEE SC 1999 Conference (SC'99)","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124850241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 32