{"title":"14.9 TFLOPS Three-Dimensional Fluid Simulation for Fusion Science with HPF on the Earth Simulator","authors":"H. Sakagami, H. Murai, Y. Seo, M. Yokokawa","doi":"10.1109/SC.2002.10051","DOIUrl":"https://doi.org/10.1109/SC.2002.10051","url":null,"abstract":"We succeeded in getting 14.9 TFLOPS performance when running a plasma simulation code IMPACT-3D parallelized with High Performance Fortran on 512 nodes of the Earth Simulator. The theoretical peak performance of the 512 nodes is 32 TFLOPS, which means 45% of the peak performance was obtained with HPF.IMPACT-3D is an implosion analysis code using TVD scheme, which performs three-dimensional compressible and inviscid Eulerian fluid computation with the explicit 5-point stencil scheme for spatial differentiation and the fractional time step for time integration. The mesh size is 2048x2048x4096, and the third dimension was distributed for the parallelization. The HPF system used in the evaluation is HPF/ES, developed for the Earth Simulator by enhancing NEC HPF/SX V2 mainly in communication scalability. Shift communications were manually tuned to get best performance by using HPF/JA extensions, which was designed to give the users more control over sophisticated parallelization and communication optimizations.","PeriodicalId":302800,"journal":{"name":"ACM/IEEE SC 2002 Conference (SC'02)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133289411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Scalable Parallel Fast Multipole Method for Analysis of Scattering from Perfect Electrically Conducting Surfaces","authors":"B. Hariharan, S. Aluru, B. Shanker","doi":"10.1109/SC.2002.10012","DOIUrl":"https://doi.org/10.1109/SC.2002.10012","url":null,"abstract":"In this paper, we develop a parallel Fast Multipole Method (FMM) based solution for computing the scattered electromagnetic fields from a Perfect Electrically Conducting (PEC) surface. The main contributions of this work are the development of parallel algorithms with the following characteristics: 1) provably efficient worst-case run-time irrespective of the shape of the scatterer, 2) communication-efficiency, and 3) guaranteed load balancing within a small constant factor. We have developed a scalable, parallel code and validated it against surfaces for which solution can be computed analytically, and against serial software. The efficiency and scalability of the code is demonstrated with experimental results on an IBM xSeries cluster. Though developed in the context of this particular application, our algorithms can be used in other applications involving parallel FMM.","PeriodicalId":302800,"journal":{"name":"ACM/IEEE SC 2002 Conference (SC'02)","volume":"178 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123199691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"High-Density Computing: A 240-Processor Beowulf in One Cubic Meter","authors":"Michael S. Warren, E. Weigle, Wu-chun Feng","doi":"10.1109/SC.2002.10010","DOIUrl":"https://doi.org/10.1109/SC.2002.10010","url":null,"abstract":"We present results from computations on Green Destiny, a 240-processor Beowulf cluster which is contained entirely within a single 19-inch wide 42U rack. The cluster consists of 240 Transmeta TM5600 667-MHz CPUs mounted on RLX Technologies motherboard blades. The blades are mounted side-by-side in an RLX 3U rack-mount chassis, which holds 24 blades. The overall cluster contains 10 chassis and associated Fast and Gigabit Ethernet switches. The system has a footprint of 0.5 meter 2 (6 square feet), a volume of 0.85 meter 3 (30 cubic feet) and a measured power dissipation under load of 5200 watts (including network switches). We have measured the performance of the cluster using a gravitational treecode N-body simulation of galaxy formation using 200 million particles, which sustained an average of 38.9 Gflops on 212 nodes of the system. We also present results from a three-dimensional hydrodynamic simulation of a core-collapse supernova.","PeriodicalId":302800,"journal":{"name":"ACM/IEEE SC 2002 Conference (SC'02)","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123339917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Ultra-High Performance Communication with MPI and the Sun Fire™ Link Interconnect","authors":"S. Sistare, C. Jackson","doi":"10.1109/SC.2002.10049","DOIUrl":"https://doi.org/10.1109/SC.2002.10049","url":null,"abstract":"We present a new low-latency system area network that provides the ultra-high bandwidth needed to fuse a collection of large SMP servers into a capability cluster. The network adapter exports a remote shared memory (RSM) model that supports low latency kernel bypass messaging. The Sun™ MPI library uses the RSM interface to implement a highly efficient memory-to-memory messaging protocol in which the library directly manages buffers and data structures in remote memory. This allows flexible allocation of buffer space to active connections, while avoiding resource contention that could otherwise increase latencies. We discuss the characteristics of the interconnect, describe the MPI protocols, and measure the performance of a number of MPI benchmarks. Our results include MPI inter-node bandwidths of almost 3 Gigabytes per second and MPI ping-pong latencies as low as 3.7 microseconds.","PeriodicalId":302800,"journal":{"name":"ACM/IEEE SC 2002 Conference (SC'02)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123389107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Active Harmony: Towards Automated Performance Tuning","authors":"Cristian Tapus, I. Chung, J. Hollingsworth","doi":"10.1109/SC.2002.10062","DOIUrl":"https://doi.org/10.1109/SC.2002.10062","url":null,"abstract":"In this paper, we present the Active Harmony automated runtime tuning system. We describe the interface used by programs to make applications tunable. We present the Library Specification Layer which helps program library developers expose multiple variations of the same API using different algorithms.The Library Specification Language helps to select the most appropriate program library to tune the overall performance. We also present the optimization algorithm used to adjust parameters in the application and the libraries. Finally, we present results that show how the system is able to tune several real applications. The automated tuning system is able to tune the application parameers to within a few percent of the best value after evaluating only 11 out of over 1,700 possible configurations.","PeriodicalId":302800,"journal":{"name":"ACM/IEEE SC 2002 Conference (SC'02)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129226680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Disk Cache Replacement Algorithm for Storage Resource Managers in Data Grids","authors":"E. Otoo, F. Olken, A. Shoshani","doi":"10.1109/SC.2002.10043","DOIUrl":"https://doi.org/10.1109/SC.2002.10043","url":null,"abstract":"We address the problem of cache replacement policies for Storage Resource Managers (SRMs) that are used in Data Grids. An SRM has a disk storage of bounded capacity that retains some N objects. A replacement policy is applied to determine which object in the cache needs to be evicted when space is needed. We define a utility function for ranking the candidate objects for eviction and then describe an efficient algorithm for computing the replacement policy based on this function. This computation takes time O (log N). We compare our policy with traditional replacement policies such as Least Frequently Used (LFU), Least Recently Used (LRU), LRU-K, Greedy Dual Size (GDS), etc., using simulations of both synthetic and real workloads of file accesses to tertiary storage. Our simulations of replacement policies account for delays in cache space reservation, data transfer and processing. The results obtained show that our proposed method is the most cost effective cache replacement policy for Storage Resource Managers (SRM).","PeriodicalId":302800,"journal":{"name":"ACM/IEEE SC 2002 Conference (SC'02)","volume":"123 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122249781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}