ACM/IEEE SC 1999 Conference (SC'99)最新文献

筛选
英文 中文
Very High Resolution Simulation of Compressible Turbulence on the IBM-SP System IBM-SP系统上可压缩湍流的高分辨率模拟
ACM/IEEE SC 1999 Conference (SC'99) Pub Date : 1900-01-01 DOI: 10.1145/331532.331601
A. Mirin, R. Cohen, B. C. Curtis, W. Dannevik, A. Dimits, M. A. Duchauneau, D. Eliason, D. Schikore, S. E. Anderson, D. Porter, P. Woodward, L. Shieh, Steven W. White
{"title":"Very High Resolution Simulation of Compressible Turbulence on the IBM-SP System","authors":"A. Mirin, R. Cohen, B. C. Curtis, W. Dannevik, A. Dimits, M. A. Duchauneau, D. Eliason, D. Schikore, S. E. Anderson, D. Porter, P. Woodward, L. Shieh, Steven W. White","doi":"10.1145/331532.331601","DOIUrl":"https://doi.org/10.1145/331532.331601","url":null,"abstract":"Understanding turbulence and mix in compressible flows is of fundamental importance to real-world applications such as chemical combustion and supernova evolution. The ability to run in three dimensions and at very high resolution is required for the simulation to accurately represent the interaction of the various length scales, and consequently, the reactivity of the intermixin species. Toward this end, we have carried out a very high resolution (over 8 billion zones) 3-D simulation of the Richtmyer-Meshkov instability and turbulent mixing on the IBM Sustained Stewardship TeraOp (SST) system, developed under the auspices of the Department of Energy (DOE) Accelerated Strategic Computing Initiative (ASCI) and located at Lawrence Livermore National Laboratory. We have also undertaken an even higher resolution proof-of-principle calculation (over 24 billion zones) on 5832 processors of the IBM system, which executed for over an hour at a sustained rate of 1.05 Tflop/s, as well as a short calculation with a modified algorithm that achieved a sustained rate of 1.18Tflop/s. The full production scientific simulation, using a further modified algorithm, ran for 27,000 timesteps in slightly over a week of wall time using 3840 processors of the IBM system, clockin a sustained throughput of roughly 0.6 teraflop per second (32-bit arithmetic). Nearly 300,000 graphics files comprising over three terabytes of data were produced and post-processed. The capability of running in 3-D at high resolution enabled us to get a more accurate and detailed picture of the fluid-flow structure - in particular, to simulate the development of fine scale structures from the interactions of long-and short-wavelength phenomena, to elucidate differences between two-dimensional and three-dimensional turbulence, to explore a conjecture regarding the transition from unstable flow to fully developed turbulence with increasing Reynolds number, and to ascertain convergence of the computed solution with respect to mesh resolution.","PeriodicalId":354898,"journal":{"name":"ACM/IEEE SC 1999 Conference (SC'99)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116459256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 92
SunTM MPI I/O: Efficient I/O for Parallel Applications SunTM MPI I/O:并行应用的高效I/O
ACM/IEEE SC 1999 Conference (SC'99) Pub Date : 1900-01-01 DOI: 10.1145/331532.331546
L. Wisniewski, Brad Smisloff, N. Nieuwejaar
{"title":"SunTM MPI I/O: Efficient I/O for Parallel Applications","authors":"L. Wisniewski, Brad Smisloff, N. Nieuwejaar","doi":"10.1145/331532.331546","DOIUrl":"https://doi.org/10.1145/331532.331546","url":null,"abstract":"Many parallel applications require high-performance I/O to avoid negating some or all of the benefit derived from parallelizing its computation. When these applications are run on a loosely-coupled cluster of SMPs, the limitations of existing hardware and software present even more hurdles to performing high-performance I/O. In this paper, we describe our full implementation of the I/O portion of the MPI-2 specification. In particular, we discuss the limitations inherent in performing high-performance I/O on a cluster of SMPs and demonstrate the benefits of using a cluster-based filesystem over a traditional node-based filesystem.","PeriodicalId":354898,"journal":{"name":"ACM/IEEE SC 1999 Conference (SC'99)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123047255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A Cost-Benefit Scheme for High Performance Predictive Prefetching 一种高性能预测预取的成本效益方案
ACM/IEEE SC 1999 Conference (SC'99) Pub Date : 1900-01-01 DOI: 10.1145/331532.331582
V. Vellanki, A. Chervenak
{"title":"A Cost-Benefit Scheme for High Performance Predictive Prefetching","authors":"V. Vellanki, A. Chervenak","doi":"10.1145/331532.331582","DOIUrl":"https://doi.org/10.1145/331532.331582","url":null,"abstract":"High-performance computing systems will increasingly rely on prefetching data from disk to overcome long disk access times and maintain high utilization of parallel I/O systems. This paper evaluates a prefetching technique that chooses which blocks to prefetch based on their probability of access and decides whether to prefetch a particular block at a given time using a cost-benefit analysis. The algorithm uses a probability tree to record past accesses and to predict future access patterns. We simulate this prefetching algorithm with a variety of I/O traces. We show that our predictive prefetching scheme combined with simple one-block-lookahead prefetching produces good performance for a variety of workloads. The scheme reduces file cache miss rates by up to 36% for workloads that receive no benefit from sequential prefetching. We showthat the memory requirements for building the probability tree are reasonable, requiring about a megabyte for good performance. The probability tree constructed by the prefetching scheme predicts around 60-70% of the accesses. Next, we discuss ways of improving the performance of the prefetching scheme. Finally, we show that the cost-benefit analysis enables the tree-based prefetching scheme to perform an optimal amount of prefetching.","PeriodicalId":354898,"journal":{"name":"ACM/IEEE SC 1999 Conference (SC'99)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123390638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
Direct Numerical Simulation of Turbulence with a PC/Linux Cluster: Fact or Fiction? 用PC/Linux集群直接数值模拟湍流:事实还是虚构?
ACM/IEEE SC 1999 Conference (SC'99) Pub Date : 1900-01-01 DOI: 10.1145/331532.331585
G. Karamanos, C. Evangelinos, R. C. Boes, R. Kirby, G. Karniadakis
{"title":"Direct Numerical Simulation of Turbulence with a PC/Linux Cluster: Fact or Fiction?","authors":"G. Karamanos, C. Evangelinos, R. C. Boes, R. Kirby, G. Karniadakis","doi":"10.1145/331532.331585","DOIUrl":"https://doi.org/10.1145/331532.331585","url":null,"abstract":"Direct Numerical Simulation (DNS) of turbulence requires many CPU days and Gigabytes of memory. These requirements limit most DNS to using supercomputers, available at supercomputer centres. With the rapid development and low cost of PCs, PC clusters are evaluated as a viable low-cost option for scientific computing. Both low-end and high-end PC clusters, ranging from 2 to 128 processors, are compared to a range of existing supercomputers, such as the IBM SP nodes, Silicon Graphics Origin 2000, Fujitsu AP3000 and Cray T3E. The comparison concentrates on CPU and communication performance. At the kernel level, BLAS libraries are used for CPU performance evaluation. Regarding communication, the free implementations of MPICH and LAM are used on fast-ethernet-based systems and compared to myrinet-based and supercomputer networks. At the application level, serial and parallel simulations are performed on state of the art DNS, such as turbulent wake flows in stationary and moving computational domains.","PeriodicalId":354898,"journal":{"name":"ACM/IEEE SC 1999 Conference (SC'99)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126561750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Optimization of MPI Collectives on Clusters of Large-Scale SMP’s 大规模SMP’s集群上MPI集体的优化
ACM/IEEE SC 1999 Conference (SC'99) Pub Date : 1900-01-01 DOI: 10.1145/331532.331555
S. Sistare, Rolf vande Vaart, E. Loh
{"title":"Optimization of MPI Collectives on Clusters of Large-Scale SMP’s","authors":"S. Sistare, Rolf vande Vaart, E. Loh","doi":"10.1145/331532.331555","DOIUrl":"https://doi.org/10.1145/331532.331555","url":null,"abstract":"Implementors of message-passing libraries have focused on optimizing point-to-point protocols and have largely ignored the performance of collective operations. In addition, algorithms for collectives have been tuned to run well on networks of uni-processor machines, ignoring the performance that may be gained on large-scale SMP’s in wide-spread use as compute nodes. This is unfortunate, because the high backplane bandwidths and shared-memory capabilities of large SMP’s are a perfect match for the requirements of collectives. We present new algorithms for MPI collective operations that take advantage of the capabilities of fat-node SMP’s and provide models that show the characteristics of the old and new algorithms. Using the SunTM MPI library, we present results on a 64-way StarfireTM SMP and a 4-node cluster of 8-way Sun EnterpriseTM 4000 nodes that show performance improvements ranging typically from 2x to 5x for the collectives we studied.","PeriodicalId":354898,"journal":{"name":"ACM/IEEE SC 1999 Conference (SC'99)","volume":"796 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113999158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 86
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信