2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis最新文献

Performance optimization of TCP/IP over 10 Gigabit Ethernet by precise instrumentation 通过精密仪器对10千兆以太网上的TCP/IP进行性能优化

2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis Pub Date : 2008-11-15 DOI: 10.5555/1413370.1413382

Takeshi Yoshino, Yutaka Sugawara, K. Inagami, J. Tamatsukuri, M. Inaba, K. Hiraki

引用次数: 33

Adapting a message-driven parallel application to GPU-accelerated clusters 使消息驱动的并行应用程序适应gpu加速的集群

2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis Pub Date : 2008-11-15 DOI: 10.1109/SC.2008.5214716

James C. Phillips, J. Stone, K. Schulten

引用次数: 185

Scientific application-based performance comparison of SGI Altix 4700, IBM POWER5+, and SGI ICE 8200 supercomputers SGI Altix 4700、IBM POWER5+和SGI ICE 8200超级计算机基于科学应用程序的性能比较

2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis Pub Date : 2008-11-15 DOI: 10.1145/1413370.1413378

S. Saini, Dale Talcott, D. Jespersen, M. J. Djomehri, Haoqiang Jin, R. Biswas

{"title":"Scientific application-based performance comparison of SGI Altix 4700, IBM POWER5+, and SGI ICE 8200 supercomputers","authors":"S. Saini, Dale Talcott, D. Jespersen, M. J. Djomehri, Haoqiang Jin, R. Biswas","doi":"10.1145/1413370.1413378","DOIUrl":"https://doi.org/10.1145/1413370.1413378","url":null,"abstract":"The suitability of next-generation high-performance computing systems for petascale simulations will depend on various performance factors attributable to processor, memory, local and global network, and input/output characteristics. In this paper, we evaluate performance of new dual-core SGI Altix 4700, quad-core SGI Altix ICE 8200, and dual-core IBM POWER5+ systems. To measure performance, we used micro-benchmarks from High Performance Computing Challenge (HPCC), NAS Parallel Benchmarks (NPB), and four real-world applications- three from computational fluid dynamics (CFD) and one from climate modeling. We used the micro-benchmarks to develop a controlled understanding of individual system components, then analyzed and interpreted performance of the NPBs and applications. We also explored the hybrid programming model (MPI+OpenMP) using multi-zone NPBs and the CFD application OVERFLOW-2. Achievable application performance is compared across the systems. For the ICE platform, we also investigated the effect of memory bandwidth on performance by testing 1, 2, 4, and 8 cores per node.","PeriodicalId":230761,"journal":{"name":"2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114202042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 35

The cost of doing science on the cloud: The Montage example 在云上进行科学研究的成本:蒙太奇的例子

2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis Pub Date : 2008-11-15 DOI: 10.1109/SC.2008.5217932

E. Deelman, Gurmeet Singh, M. Livny, B. Berriman, J. Good

引用次数: 815

Extending CC-NUMA systems to support write update optimizations 扩展CC-NUMA系统以支持写更新优化

2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis Pub Date : 2008-11-15 DOI: 10.1145/1413370.1413401

Liqun Cheng, J. Carter

{"title":"Extending CC-NUMA systems to support write update optimizations","authors":"Liqun Cheng, J. Carter","doi":"10.1145/1413370.1413401","DOIUrl":"https://doi.org/10.1145/1413370.1413401","url":null,"abstract":"Processor stalls and protocol messages caused by coherence misses limit the performance of shared memory applications. Modern multiprocessors employ write-invalidate coherence protocols, which induce read misses to ensure consistency. Previous research has shown that an invalidate protocol is not optimal for all memory access patterns - an update protocol can significantly outperform an invalidate protocol when data is heavily shared or accessed in predictable patterns. However, update protocols can generate excessive network traffic and are difficult to build on a scalable (non-bus) interconnect. To obtain the benefits of both invalidate and update protocols, we built a speculative sequentially consistent write- update mechanism on top of a write-invalidate protocol. To ensure coherence, a processor wishing to write to a block of data uses a traditional write-invalidate protocol to obtain exclusive access to the block before modifying it. To improve performance, the writing processor can later self- downgrade the modified block to the shared state and flush it back to its home node, which forwards the new data to processors that it predicts are likely to consume the data. We present a practical and cost-effective design for extending CC-NUMA systems to support this speculative update mechanism that requires no changes to the processor core, bus interface, or memory consistency model. We also present two hardware-efficient mechanisms for detecting access patterns that benefit from the speculative update mechanism, stable reader set and stream. We evaluate our update mechanisms on a wide range of scientific benchmarks and commercial applications. Using a cycle-accurate execution-driven simulator of a future 16-node SGI multiprocessor, we find that the mechanisms proposed in this paper reduce the average remote miss rate by 30%, reduce network traffic by 15%, and improve performance by 10%, and in no case hurt performance.","PeriodicalId":230761,"journal":{"name":"2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130161632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Dynamically adapting file domain partitioning methods for collective I/O based on underlying parallel file system locking protocols 基于底层并行文件系统锁定协议的集体I/O动态调整文件域分区方法

2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis Pub Date : 2008-11-15 DOI: 10.1145/1413370.1413374

W. Liao, A. Choudhary

{"title":"Dynamically adapting file domain partitioning methods for collective I/O based on underlying parallel file system locking protocols","authors":"W. Liao, A. Choudhary","doi":"10.1145/1413370.1413374","DOIUrl":"https://doi.org/10.1145/1413370.1413374","url":null,"abstract":"Collective I/O, such as that provided in MPI-IO, enables process collaboration among a group of processes for greater I/O parallelism. Its implementation involves file domain partitioning, and having the right partitioning is a key to achieving high-performance I/O. As modern parallel file systems maintain data consistency by adopting a distributed file locking mechanism to avoid centralized lock management, different locking protocols can have significant impact to the degree of parallelism of a given file domain partitioning method. In this paper, we propose dynamic file partitioning methods that adapt according to the underlying locking protocols in the parallel file systems and evaluate the performance of four partitioning methods under two locking protocols. By running multiple I/O benchmarks, our experiments demonstrate that no single partitioning guarantees the best performance. Using MPI-IO as an implementation platform, we provide guidelines to select the most appropriate partitioning methods for various I/O patterns and file systems.","PeriodicalId":230761,"journal":{"name":"2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis","volume":"473 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133432417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 103

Nimrod/K: Towards massively parallel dynamic Grid workflows Nimrod/K:走向大规模并行动态网格工作流

2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis Pub Date : 2008-11-15 DOI: 10.1109/SC.2008.5215726

D. Abramson, C. Enticott, I. Altintas

引用次数: 84

EpiSimdemics: An efficient algorithm for simulating the spread of infectious disease over large realistic social networks 流行病:一种有效的算法，用于模拟传染病在大型现实社会网络中的传播

2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis Pub Date : 2008-11-15 DOI: 10.1109/SC.2008.5214892

C. Barrett, K. Bisset, S. Eubank, Xizhou Feng, M. Marathe

{"title":"EpiSimdemics: An efficient algorithm for simulating the spread of infectious disease over large realistic social networks","authors":"C. Barrett, K. Bisset, S. Eubank, Xizhou Feng, M. Marathe","doi":"10.1109/SC.2008.5214892","DOIUrl":"https://doi.org/10.1109/SC.2008.5214892","url":null,"abstract":"Preventing and controlling outbreaks of infectious diseases such as pandemic influenza is a top public health priority. We describe EpiSimdemics - a scalable parallel algorithm to simulate the spread of contagion in large, realistic social contact networks using individual-based models. EpiSimdemics is an interaction-based simulation of a certain class of stochastic reaction-diffusion processes. Straightforward simulations of such process do not scale well, limiting the use of individual-based models to very small populations. EpiSimdemics is specifically designed to scale to social networks with 100 million individuals. The scaling is obtained by exploiting the semantics of disease evolution and disease propagation in large networks. We evaluate an MPI-based parallel implementation of EpiSimdemics on a mid-sized HPC system, demonstrating that EpiSimdemics scales well. EpiSimdemics has been used in numerous sponsor defined case studies targeted at policy planning and course of action analysis, demonstrating the usefulness of EpiSimdemics in practical situations.","PeriodicalId":230761,"journal":{"name":"2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124037123","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 302

Using overlays for efficient data transfer over shared wide-area networks 利用覆盖层在共享广域网上进行有效的数据传输

2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis Pub Date : 2008-11-15 DOI: 10.1145/1413370.1413418

Gaurav Khanna, Ümit V. Çatalyürek, T. Kurç, R. Kettimuthu, P. Sadayappan, Ian T Foster, J. Saltz

{"title":"Using overlays for efficient data transfer over shared wide-area networks","authors":"Gaurav Khanna, Ümit V. Çatalyürek, T. Kurç, R. Kettimuthu, P. Sadayappan, Ian T Foster, J. Saltz","doi":"10.1145/1413370.1413418","DOIUrl":"https://doi.org/10.1145/1413370.1413418","url":null,"abstract":"Data-intensive applications frequently transfer large amounts of data over wide-area networks. The performance achieved in such settings can often be improved by routing data via intermediate nodes chosen to increase aggregate bandwidth. We explore the benefits of overlay network approaches by designing and implementing a service-oriented architecture that incorporates two key optimizations - multi-hop path splitting and multi-pathing - within the GridFTP file transfer protocol. We develop a file transfer scheduling algorithm that incorporates the two optimizations in conjunction with the use of available file replicas. The algorithm makes use of information from past GridFTP transfers to estimate network bandwidths and resource availability. The effectiveness of these optimizations is evaluated using several application file transfer patterns: one-to-all broadcast, all-to-one gather, and data redistribution, on a wide-area testbed. The experimental results show that our architecture and algorithm achieve significant performance improvement.","PeriodicalId":230761,"journal":{"name":"2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130209022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 48

Capturing performance knowledge for automated analysis 为自动分析捕获性能知识

2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis Pub Date : 2008-11-15 DOI: 10.1109/SC.2008.5222642

K. Huck, Oscar R. Hernandez, Van Bui, S. Chandrasekaran, B. Chapman, A. Malony, L. McInnes, B. Norris

{"title":"Capturing performance knowledge for automated analysis","authors":"K. Huck, Oscar R. Hernandez, Van Bui, S. Chandrasekaran, B. Chapman, A. Malony, L. McInnes, B. Norris","doi":"10.1109/SC.2008.5222642","DOIUrl":"https://doi.org/10.1109/SC.2008.5222642","url":null,"abstract":"Automating the process of parallel performance experimentation, analysis, and problem diagnosis can enhance environments for performance-directed application development, compilation, and execution. This is especially true when parametric studies, modeling, and optimization strategies require large amounts of data to be collected and processed for knowledge synthesis and reuse. This paper describes the integration of the PerfExplorer performance data mining framework with the OpenUH compiler infrastructure. OpenUH provides auto-instrumentation of source code for performance experimentation and PerfExplorer provides automated and reusable analysis of the performance data through a scripting interface. More importantly, PerfExplorer inference rules have been developed to recognize and diagnose performance characteristics important for optimization strategies and modeling. Three case studies are presented which show our success with automation in OpenMP and MPI code tuning, parametric characterization, Pand power modeling. The paper discusses how the integration supports performance knowledge engineering across applications and feedback-based compiler optimization in general.","PeriodicalId":230761,"journal":{"name":"2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126856865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 26