ACM/IEEE SC 2002 Conference (SC'02)最新文献

筛选
英文 中文
16.4-Tflops Direct Numerical Simulation of Turbulence by a Fourier Spectral Method on the Earth Simulator 16.4在地球模拟器上用傅立叶谱法直接数值模拟湍流
ACM/IEEE SC 2002 Conference (SC'02) Pub Date : 2002-11-16 DOI: 10.1109/SC.2002.10052
M. Yokokawa, K. Itakura, Atsuya Uno, T. Ishihara, Y. Kaneda
{"title":"16.4-Tflops Direct Numerical Simulation of Turbulence by a Fourier Spectral Method on the Earth Simulator","authors":"M. Yokokawa, K. Itakura, Atsuya Uno, T. Ishihara, Y. Kaneda","doi":"10.1109/SC.2002.10052","DOIUrl":"https://doi.org/10.1109/SC.2002.10052","url":null,"abstract":"The high-resolution direct numerical simulations (DNSs) of incompressible turbulence with numbers of grid points up to 40963 have been executed on the Earth Simulator (ES). The DNSs are based on the Fourier spectral method, so that the equation for mass conservation is accurately solved. In DNS based on the spectral method, most of the computation time is consumed in calculating the three-dimensional (3D) Fast Fourier Transform (FFT), which requires huge-scale global data transfer and has been the major stumbling block that has prevented truly high-performance computing. By implementing new methods to efficiently perform the 3D-FFT on the ES, we have achieved DNS at 16.4 Tflops on 20483 grid points. The DNS yields an energy spectrum exhibiting a wide inertial subrange, in contrast to previous DNSs with lower resolutions, and therefore provides valuable data for the study of the universal features of turbulence at large Reynolds number.","PeriodicalId":302800,"journal":{"name":"ACM/IEEE SC 2002 Conference (SC'02)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115532058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 153
MPI and OpenMP Paradigms on Cluster of SMP Architectures: The Vacancy Tracking Algorithm for Multi-Dimensional Array Transposition SMP结构簇上的MPI和OpenMP范式:多维阵列换位的空位跟踪算法
ACM/IEEE SC 2002 Conference (SC'02) Pub Date : 2002-11-16 DOI: 10.12694/SCPE.V5I2.276
Yun He, C. Ding
{"title":"MPI and OpenMP Paradigms on Cluster of SMP Architectures: The Vacancy Tracking Algorithm for Multi-Dimensional Array Transposition","authors":"Yun He, C. Ding","doi":"10.12694/SCPE.V5I2.276","DOIUrl":"https://doi.org/10.12694/SCPE.V5I2.276","url":null,"abstract":"We investigate remapping multi-dimensional arrays on cluster of SMP architectures under OpenMP, MPI, and hybrid paradigms. Traditional method of array transpose needs an auxiliary array of the same size and a copy back stage. We recently developed an in-place method using vacancy tracking cycles. The vacancy tracking algorithm outperforms the traditional 2-array method as demonstrated by extensive comparisons. The independence of vacancy tracking cycles allows efficient parallelization of the in-place method on SMP architectures at node level. Performance of multi-threaded parallelism using OpenMP are tested with different scheduling methods and different number of threads. The vacancy tracking method is parallelized using several parallel paradigms. At node level, pure OpenMP outperforms pure MPI by a factor of 2.76. Across entire cluster of SMP nodes, the hybrid MPI/OpenMP implementation outperforms pure MPI by a factor of 4.44, demonstrating the validity of the parallel paradigm of mixing MPI with OpenMP.","PeriodicalId":302800,"journal":{"name":"ACM/IEEE SC 2002 Conference (SC'02)","volume":"767 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120882880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 37
A High-Level Approach to Synthesis of High-Performance Codes for Quantum Chemistry 合成量子化学高性能代码的高级方法
ACM/IEEE SC 2002 Conference (SC'02) Pub Date : 2002-11-16 DOI: 10.1109/SC.2002.10056
Gerald Baumgartner, D. Bernholdt, D. Cociorva, R. Harrison, S. Hirata, Chi-Chung Lam, M. Nooijen, R. Pitzer, J. Ramanujam, P. Sadayappan
{"title":"A High-Level Approach to Synthesis of High-Performance Codes for Quantum Chemistry","authors":"Gerald Baumgartner, D. Bernholdt, D. Cociorva, R. Harrison, S. Hirata, Chi-Chung Lam, M. Nooijen, R. Pitzer, J. Ramanujam, P. Sadayappan","doi":"10.1109/SC.2002.10056","DOIUrl":"https://doi.org/10.1109/SC.2002.10056","url":null,"abstract":"This paper discusses an approach to the synthesis of high-performance parallel programs for a class of computations encountered in quantum chemistry and physics. These computations are expressible as a set of tensor contractions and arise in electronic structure modeling. An overview is provided of the synthesis system, that transforms a high-level specification of the computation into high-performance parallel code, tailored to the characteristics of the target architecture. An example from computational chemistry is used to illustrate how different code structures are generated under different assumptions of available memory on the target computer.","PeriodicalId":302800,"journal":{"name":"ACM/IEEE SC 2002 Conference (SC'02)","volume":"154 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116634028","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 61
ICENI: An Open Grid Service Architecture Implemented with Jini ICENI:使用Jini实现的开放网格服务体系结构
ACM/IEEE SC 2002 Conference (SC'02) Pub Date : 2002-11-16 DOI: 10.1109/SC.2002.10027
N. Furmento, William Lee, A. Mayer, S. Newhouse, J. Darlington
{"title":"ICENI: An Open Grid Service Architecture Implemented with Jini","authors":"N. Furmento, William Lee, A. Mayer, S. Newhouse, J. Darlington","doi":"10.1109/SC.2002.10027","DOIUrl":"https://doi.org/10.1109/SC.2002.10027","url":null,"abstract":"The move towards Service Grids, where services are composed to meet the requirements of a user community within constraints specified by the resource provider, present many challenges to service provision and description. To support our research activities in the autonomous composition of services to form a Semantic Service Grid we describe the adoption within ICENI of web services to enable interoperability with the recently proposed Open Grid Services Architecture.","PeriodicalId":302800,"journal":{"name":"ACM/IEEE SC 2002 Conference (SC'02)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123632650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 119
The Web Service Discovery Architecture Web服务发现体系结构
ACM/IEEE SC 2002 Conference (SC'02) Pub Date : 2002-11-16 DOI: 10.1109/SC.2002.10033
Wolfgang Hoschek
{"title":"The Web Service Discovery Architecture","authors":"Wolfgang Hoschek","doi":"10.1109/SC.2002.10033","DOIUrl":"https://doi.org/10.1109/SC.2002.10033","url":null,"abstract":"In this paper, we propose the Web Service Discovery Architecture (WSDA). At runtime, Grid applications can use this architecture to discover and adapt to remote services. WSDA promotes an interoperable web service discovery layer by defining appropriate services, interfaces, operations and protocol bindings, based on industry standards. It is unified because it subsumes an array of disparate concepts, interfaces and protocols under a single semi-transparent umbrella. It is modular because it defines a small set of orthogonal multi-purpose communication primitives (building blocks) for discovery. These primitives cover service identification, service description retrieval, data publication as well as minimal and powerful query support. The architecture is open and flexible because each primitive can be used, implemented, customized and extended in many ways. It is powerful because the individual primitives can be combined and plugged together by specific clients and services to yield a wide range of behaviors and emerging synergies.","PeriodicalId":302800,"journal":{"name":"ACM/IEEE SC 2002 Conference (SC'02)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124768894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 102
A Decoupled Scheduling Approach for the GrADS Program Development Environment 梯度程序开发环境下的解耦调度方法
ACM/IEEE SC 2002 Conference (SC'02) Pub Date : 2002-11-16 DOI: 10.1109/SC.2002.10009
H. Dail, H. Casanova, F. Berman
{"title":"A Decoupled Scheduling Approach for the GrADS Program Development Environment","authors":"H. Dail, H. Casanova, F. Berman","doi":"10.1109/SC.2002.10009","DOIUrl":"https://doi.org/10.1109/SC.2002.10009","url":null,"abstract":"Program development environments are instrumental in providing users with easy and efficient access to parallel computing platforms. While a number of such environments have been widely accepted and used for traditional HPC systems, there are currently no widely used environments for Grid programming. The goal of the Grid Application Development Software (GrADS) project is to develop a coordinated set of tools, libraries and run-time execution facilities for Grid program development. In this paper, we describe a Grid scheduler component that is integrated as part of the GrADS software system. Traditionally, application-level schedulers (e.g. AppLeS) have been tightly integrated with the application itself and were not easily applied to other applications. Our design is generic: we decouple the scheduler core (the search procedure) from the application-specific (e.g. application performance models) and platform-specific (e.g. collection of resource information) components used by the search procedure. We provide experimental validation of our approach for two representative regular, iterative parallel programs in a variety of real-world Grid testbeds. Our scheduler consistently outperforms static and user-driven scheduling methods.","PeriodicalId":302800,"journal":{"name":"ACM/IEEE SC 2002 Conference (SC'02)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124877110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 75
Scaling the Unscalable: A Case Study on the AlphaServer SC 缩放不可伸缩:AlphaServer SC的案例研究
ACM/IEEE SC 2002 Conference (SC'02) Pub Date : 2002-11-16 DOI: 10.1109/SC.2002.10035
P. Worley
{"title":"Scaling the Unscalable: A Case Study on the AlphaServer SC","authors":"P. Worley","doi":"10.1109/SC.2002.10035","DOIUrl":"https://doi.org/10.1109/SC.2002.10035","url":null,"abstract":"A case study of the optimization of a climate modeling application on the Compaq AlphaServer SC at the Pittsburgh Supercomputer Center is used to illustrate tools and techniques that are important to achieving good performance scaling.","PeriodicalId":302800,"journal":{"name":"ACM/IEEE SC 2002 Conference (SC'02)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125988203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
UPC Performance and Potential: A NPB Experimental Study UPC性能与潜力:NPB实验研究
ACM/IEEE SC 2002 Conference (SC'02) Pub Date : 2002-11-16 DOI: 10.1109/SC.2002.10034
T. El-Ghazawi, François Cantonnet
{"title":"UPC Performance and Potential: A NPB Experimental Study","authors":"T. El-Ghazawi, François Cantonnet","doi":"10.1109/SC.2002.10034","DOIUrl":"https://doi.org/10.1109/SC.2002.10034","url":null,"abstract":"UPC, or Unified Parallel C, is a parallel extension of ANSI C. UPC follows a distributed shared memory programming model aimed at leveraging the ease of programming of the shared memory paradigm, while enabling the exploitation of data locality. UPC incorporates constructs that allow placing data near the threads that manipulate them to minimize remote accesses. This paper gives an overview of the concepts and features of UPC and establishes, through extensive performance measurements of NPB workloads, the viability of the UPC programming language compared to the other popular paradigms. Further, through performance measurements we identify the challenges, the remaining steps and the priorities for UPC. It will be shown that with proper hand tuning and optimized collective operations libraries, UPC performance will be comparable to that of MPI. Furthermore, by incorporating such improvements into automatic compiler optimizations, UPC will compare quite favorably to message passing in ease of programming.","PeriodicalId":302800,"journal":{"name":"ACM/IEEE SC 2002 Conference (SC'02)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116793424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 141
A 29.5 Tflops Simulation of Planetesimals in Uranus-Neptune Region on GRAPE-6 在GRAPE-6上对天王星-海王星区域内的星子进行29.5 Tflops模拟
ACM/IEEE SC 2002 Conference (SC'02) Pub Date : 2002-11-16 DOI: 10.1109/SC.2002.10022
J. Makino, E. Kokubo, T. Fukushige, H. Daisaka
{"title":"A 29.5 Tflops Simulation of Planetesimals in Uranus-Neptune Region on GRAPE-6","authors":"J. Makino, E. Kokubo, T. Fukushige, H. Daisaka","doi":"10.1109/SC.2002.10022","DOIUrl":"https://doi.org/10.1109/SC.2002.10022","url":null,"abstract":"As an entry for the 2002 Gordon Bell performance prize, we report the performance achieved on the GRAPE-6 system for a simulation of the early evolution of the protoplanet-planetesimal system of the Uranus-Neptune region. GRAPE-6 is a special-purpose computer for astrophysical N-body calculations. The present configuration has 2048 custom pipeline chips, each containing six pipeline processors for the calculation of gravitational interactions between particles. Its theoretical peak performance is 63.4 Tflops. The actual performance obtained was 29.5 Tflops, for a simulation of the early evolution of outer Solar system with 1.8 million planetesimals and two massive protoplanets.","PeriodicalId":302800,"journal":{"name":"ACM/IEEE SC 2002 Conference (SC'02)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132690162","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Separated High-Bandwidth and Low-Latency Communication in the Cluster Interconnect Clint 集群互联客户端中分离的高带宽低时延通信
ACM/IEEE SC 2002 Conference (SC'02) Pub Date : 2002-11-16 DOI: 10.1109/SC.2002.10042
H. Eberle, N. Gura
{"title":"Separated High-Bandwidth and Low-Latency Communication in the Cluster Interconnect Clint","authors":"H. Eberle, N. Gura","doi":"10.1109/SC.2002.10042","DOIUrl":"https://doi.org/10.1109/SC.2002.10042","url":null,"abstract":"An interconnect for a high-performance cluster has to be optimized in respect to both high throughput and low latency. To avoid the tradeoff between throughput and latency, the cluster interconnect Clint1 has a segregated architecture that provides two physically separate transmission channels: A bulk channel optimized for high-bandwidth traffic and a quick channel optimized for low-latency traffic. Different scheduling strategies are applied. The bulk channel uses a scheduler that globally allocates time slots on the transmission paths before packets are sent off. This way collisions as well as blockages are avoided. In contrast, the quick channel takes a best-effort approach by sending packets whenever they are available thereby risking collisions and retransmissions. Simulation results clearly show the performance advantages of the segregated architecture. The carefully scheduled bulk channel can be loaded nearly to its full capacity without exhibiting head-of-line blocking that limits many networks while the quick channel provides low-latency communication even in the presence of high-bandwidth traffic.","PeriodicalId":302800,"journal":{"name":"ACM/IEEE SC 2002 Conference (SC'02)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133574569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信