2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis最新文献

筛选
英文 中文
Performance optimization of TCP/IP over 10 Gigabit Ethernet by precise instrumentation 通过精密仪器对10千兆以太网上的TCP/IP进行性能优化
Takeshi Yoshino, Yutaka Sugawara, K. Inagami, J. Tamatsukuri, M. Inaba, K. Hiraki
{"title":"Performance optimization of TCP/IP over 10 Gigabit Ethernet by precise instrumentation","authors":"Takeshi Yoshino, Yutaka Sugawara, K. Inagami, J. Tamatsukuri, M. Inaba, K. Hiraki","doi":"10.5555/1413370.1413382","DOIUrl":"https://doi.org/10.5555/1413370.1413382","url":null,"abstract":"End-to-end communications on 10 Gigabit Ethernet (10 GbE) WAN became popular. However, there are difficulties that need to be solved before utilizing Long Fat-pipe Networks (LFNs) by using TCP. We observed that the followings caused performance depression: short-term bursty data transfer, mismatch between TCP and hardware support, and excess CPU load. In this research, we have established systematic methodologies to optimize TCP on LFNs. In order to pinpoint causes of performance depression, we analyzed real networks precisely by using our hardware-based wire-rate analyzer with 100-ns time-resolution. We took the following actions on the basis of the observations: (1) utilizing hardware-based pacing to avoid unnecessary packet losses due to collisions at bottlenecks, (2) modifying TCP to adapt packet coalescing mechanism, (3) modifying programs to reduce memory copies. We have achieved a constant through-put of 9.08 Gbps on a 500 ms RTT network for 5 h. Our approach has overcome the difficulties on single-end 10 GbE LFNs.","PeriodicalId":230761,"journal":{"name":"2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126052213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 33
Adapting a message-driven parallel application to GPU-accelerated clusters 使消息驱动的并行应用程序适应gpu加速的集群
James C. Phillips, J. Stone, K. Schulten
{"title":"Adapting a message-driven parallel application to GPU-accelerated clusters","authors":"James C. Phillips, J. Stone, K. Schulten","doi":"10.1109/SC.2008.5214716","DOIUrl":"https://doi.org/10.1109/SC.2008.5214716","url":null,"abstract":"Graphics processing units (GPUs) have become an attractive option for accelerating scientific computations as a result of advances in the performance and flexibility of GPU hardware, and due to the availability of GPU software development tools targeting general purpose and scientific computation. However, effective use of GPUs in clusters presents a number of application development and system integration challenges. We describe strategies for the decomposition and scheduling of computation among CPU cores and GPUs, and techniques for overlapping communication and CPU computation with GPU kernel execution. We report the adaptation of these techniques to NAMD, a widely-used parallel molecular dynamics simulation package, and present performance results for a 64-core 64-GPU cluster.","PeriodicalId":230761,"journal":{"name":"2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128272900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 185
Scientific application-based performance comparison of SGI Altix 4700, IBM POWER5+, and SGI ICE 8200 supercomputers SGI Altix 4700、IBM POWER5+和SGI ICE 8200超级计算机基于科学应用程序的性能比较
S. Saini, Dale Talcott, D. Jespersen, M. J. Djomehri, Haoqiang Jin, R. Biswas
{"title":"Scientific application-based performance comparison of SGI Altix 4700, IBM POWER5+, and SGI ICE 8200 supercomputers","authors":"S. Saini, Dale Talcott, D. Jespersen, M. J. Djomehri, Haoqiang Jin, R. Biswas","doi":"10.1145/1413370.1413378","DOIUrl":"https://doi.org/10.1145/1413370.1413378","url":null,"abstract":"The suitability of next-generation high-performance computing systems for petascale simulations will depend on various performance factors attributable to processor, memory, local and global network, and input/output characteristics. In this paper, we evaluate performance of new dual-core SGI Altix 4700, quad-core SGI Altix ICE 8200, and dual-core IBM POWER5+ systems. To measure performance, we used micro-benchmarks from High Performance Computing Challenge (HPCC), NAS Parallel Benchmarks (NPB), and four real-world applications- three from computational fluid dynamics (CFD) and one from climate modeling. We used the micro-benchmarks to develop a controlled understanding of individual system components, then analyzed and interpreted performance of the NPBs and applications. We also explored the hybrid programming model (MPI+OpenMP) using multi-zone NPBs and the CFD application OVERFLOW-2. Achievable application performance is compared across the systems. For the ICE platform, we also investigated the effect of memory bandwidth on performance by testing 1, 2, 4, and 8 cores per node.","PeriodicalId":230761,"journal":{"name":"2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114202042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 35
The cost of doing science on the cloud: The Montage example 在云上进行科学研究的成本:蒙太奇的例子
E. Deelman, Gurmeet Singh, M. Livny, B. Berriman, J. Good
{"title":"The cost of doing science on the cloud: The Montage example","authors":"E. Deelman, Gurmeet Singh, M. Livny, B. Berriman, J. Good","doi":"10.1109/SC.2008.5217932","DOIUrl":"https://doi.org/10.1109/SC.2008.5217932","url":null,"abstract":"Utility grids such as the Amazon EC2 cloud and Amazon S3 offer computational and storage resources that can be used on-demand for a fee by compute and data-intensive applications. The cost of running an application on such a cloud depends on the compute, storage and communication resources it will provision and consume. Different execution plans of the same application may result in significantly different costs. Using the Amazon cloud fee structure and a real-life astronomy application, we study via simulation the cost performance tradeoffs of different execution and resource provisioning plans. We also study these trade-offs in the context of the storage and communication fees of Amazon S3 when used for long-term application data archival. Our results show that by provisioning the right amount of storage and compute resources, cost can be significantly reduced with no significant impact on application performance.","PeriodicalId":230761,"journal":{"name":"2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis","volume":"1969 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130013178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 815
Extending CC-NUMA systems to support write update optimizations 扩展CC-NUMA系统以支持写更新优化
Liqun Cheng, J. Carter
{"title":"Extending CC-NUMA systems to support write update optimizations","authors":"Liqun Cheng, J. Carter","doi":"10.1145/1413370.1413401","DOIUrl":"https://doi.org/10.1145/1413370.1413401","url":null,"abstract":"Processor stalls and protocol messages caused by coherence misses limit the performance of shared memory applications. Modern multiprocessors employ write-invalidate coherence protocols, which induce read misses to ensure consistency. Previous research has shown that an invalidate protocol is not optimal for all memory access patterns - an update protocol can significantly outperform an invalidate protocol when data is heavily shared or accessed in predictable patterns. However, update protocols can generate excessive network traffic and are difficult to build on a scalable (non-bus) interconnect. To obtain the benefits of both invalidate and update protocols, we built a speculative sequentially consistent write- update mechanism on top of a write-invalidate protocol. To ensure coherence, a processor wishing to write to a block of data uses a traditional write-invalidate protocol to obtain exclusive access to the block before modifying it. To improve performance, the writing processor can later self- downgrade the modified block to the shared state and flush it back to its home node, which forwards the new data to processors that it predicts are likely to consume the data. We present a practical and cost-effective design for extending CC-NUMA systems to support this speculative update mechanism that requires no changes to the processor core, bus interface, or memory consistency model. We also present two hardware-efficient mechanisms for detecting access patterns that benefit from the speculative update mechanism, stable reader set and stream. We evaluate our update mechanisms on a wide range of scientific benchmarks and commercial applications. Using a cycle-accurate execution-driven simulator of a future 16-node SGI multiprocessor, we find that the mechanisms proposed in this paper reduce the average remote miss rate by 30%, reduce network traffic by 15%, and improve performance by 10%, and in no case hurt performance.","PeriodicalId":230761,"journal":{"name":"2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130161632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Dynamically adapting file domain partitioning methods for collective I/O based on underlying parallel file system locking protocols 基于底层并行文件系统锁定协议的集体I/O动态调整文件域分区方法
W. Liao, A. Choudhary
{"title":"Dynamically adapting file domain partitioning methods for collective I/O based on underlying parallel file system locking protocols","authors":"W. Liao, A. Choudhary","doi":"10.1145/1413370.1413374","DOIUrl":"https://doi.org/10.1145/1413370.1413374","url":null,"abstract":"Collective I/O, such as that provided in MPI-IO, enables process collaboration among a group of processes for greater I/O parallelism. Its implementation involves file domain partitioning, and having the right partitioning is a key to achieving high-performance I/O. As modern parallel file systems maintain data consistency by adopting a distributed file locking mechanism to avoid centralized lock management, different locking protocols can have significant impact to the degree of parallelism of a given file domain partitioning method. In this paper, we propose dynamic file partitioning methods that adapt according to the underlying locking protocols in the parallel file systems and evaluate the performance of four partitioning methods under two locking protocols. By running multiple I/O benchmarks, our experiments demonstrate that no single partitioning guarantees the best performance. Using MPI-IO as an implementation platform, we provide guidelines to select the most appropriate partitioning methods for various I/O patterns and file systems.","PeriodicalId":230761,"journal":{"name":"2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis","volume":"473 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133432417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 103
Nimrod/K: Towards massively parallel dynamic Grid workflows Nimrod/K:走向大规模并行动态网格工作流
D. Abramson, C. Enticott, I. Altintas
{"title":"Nimrod/K: Towards massively parallel dynamic Grid workflows","authors":"D. Abramson, C. Enticott, I. Altintas","doi":"10.1109/SC.2008.5215726","DOIUrl":"https://doi.org/10.1109/SC.2008.5215726","url":null,"abstract":"A challenge for Grid computing is the difficulty in developing software that is parallel, distributed and highly dynamic. Whilst there have been many general purpose mechanisms developed over the years, Grid programming still remains a low level, error prone task. Scientific workflow engines can double as programming environments, and allow a user to compose dasiavirtualpsila Grid applications from pre-existing components. Whilst existing workflow engines can specify arbitrary parallel programs, (where components use message passing) they are typically not effective with large and variable parallelism. Here we discuss dynamic dataflow, originally developed for parallel tagged dataflow architectures (TDAs), and show that these can be used for implementing Grid workflows. TDAs spawn parallel threads dynamically without additional programming. We have added TDAs to Kepler, and show that the system can orchestrate workflows that have large amounts of variable parallelism. We demonstrate the system using case studies in chemistry and in cardiac modelling.","PeriodicalId":230761,"journal":{"name":"2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis","volume":"209 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114315400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 84
EpiSimdemics: An efficient algorithm for simulating the spread of infectious disease over large realistic social networks 流行病:一种有效的算法,用于模拟传染病在大型现实社会网络中的传播
C. Barrett, K. Bisset, S. Eubank, Xizhou Feng, M. Marathe
{"title":"EpiSimdemics: An efficient algorithm for simulating the spread of infectious disease over large realistic social networks","authors":"C. Barrett, K. Bisset, S. Eubank, Xizhou Feng, M. Marathe","doi":"10.1109/SC.2008.5214892","DOIUrl":"https://doi.org/10.1109/SC.2008.5214892","url":null,"abstract":"Preventing and controlling outbreaks of infectious diseases such as pandemic influenza is a top public health priority. We describe EpiSimdemics - a scalable parallel algorithm to simulate the spread of contagion in large, realistic social contact networks using individual-based models. EpiSimdemics is an interaction-based simulation of a certain class of stochastic reaction-diffusion processes. Straightforward simulations of such process do not scale well, limiting the use of individual-based models to very small populations. EpiSimdemics is specifically designed to scale to social networks with 100 million individuals. The scaling is obtained by exploiting the semantics of disease evolution and disease propagation in large networks. We evaluate an MPI-based parallel implementation of EpiSimdemics on a mid-sized HPC system, demonstrating that EpiSimdemics scales well. EpiSimdemics has been used in numerous sponsor defined case studies targeted at policy planning and course of action analysis, demonstrating the usefulness of EpiSimdemics in practical situations.","PeriodicalId":230761,"journal":{"name":"2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124037123","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 302
Using overlays for efficient data transfer over shared wide-area networks 利用覆盖层在共享广域网上进行有效的数据传输
Gaurav Khanna, Ümit V. Çatalyürek, T. Kurç, R. Kettimuthu, P. Sadayappan, Ian T Foster, J. Saltz
{"title":"Using overlays for efficient data transfer over shared wide-area networks","authors":"Gaurav Khanna, Ümit V. Çatalyürek, T. Kurç, R. Kettimuthu, P. Sadayappan, Ian T Foster, J. Saltz","doi":"10.1145/1413370.1413418","DOIUrl":"https://doi.org/10.1145/1413370.1413418","url":null,"abstract":"Data-intensive applications frequently transfer large amounts of data over wide-area networks. The performance achieved in such settings can often be improved by routing data via intermediate nodes chosen to increase aggregate bandwidth. We explore the benefits of overlay network approaches by designing and implementing a service-oriented architecture that incorporates two key optimizations - multi-hop path splitting and multi-pathing - within the GridFTP file transfer protocol. We develop a file transfer scheduling algorithm that incorporates the two optimizations in conjunction with the use of available file replicas. The algorithm makes use of information from past GridFTP transfers to estimate network bandwidths and resource availability. The effectiveness of these optimizations is evaluated using several application file transfer patterns: one-to-all broadcast, all-to-one gather, and data redistribution, on a wide-area testbed. The experimental results show that our architecture and algorithm achieve significant performance improvement.","PeriodicalId":230761,"journal":{"name":"2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130209022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 48
Capturing performance knowledge for automated analysis 为自动分析捕获性能知识
K. Huck, Oscar R. Hernandez, Van Bui, S. Chandrasekaran, B. Chapman, A. Malony, L. McInnes, B. Norris
{"title":"Capturing performance knowledge for automated analysis","authors":"K. Huck, Oscar R. Hernandez, Van Bui, S. Chandrasekaran, B. Chapman, A. Malony, L. McInnes, B. Norris","doi":"10.1109/SC.2008.5222642","DOIUrl":"https://doi.org/10.1109/SC.2008.5222642","url":null,"abstract":"Automating the process of parallel performance experimentation, analysis, and problem diagnosis can enhance environments for performance-directed application development, compilation, and execution. This is especially true when parametric studies, modeling, and optimization strategies require large amounts of data to be collected and processed for knowledge synthesis and reuse. This paper describes the integration of the PerfExplorer performance data mining framework with the OpenUH compiler infrastructure. OpenUH provides auto-instrumentation of source code for performance experimentation and PerfExplorer provides automated and reusable analysis of the performance data through a scripting interface. More importantly, PerfExplorer inference rules have been developed to recognize and diagnose performance characteristics important for optimization strategies and modeling. Three case studies are presented which show our success with automation in OpenMP and MPI code tuning, parametric characterization, Pand power modeling. The paper discusses how the integration supports performance knowledge engineering across applications and feedback-based compiler optimization in general.","PeriodicalId":230761,"journal":{"name":"2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126856865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信