蓝色基因/L超级计算机上全对全通信的优化

2008 37th International Conference on Parallel Processing Pub Date : 2008-09-09 DOI:10.1109/ICPP.2008.83

Sameer Kumar, Yogish Sabharwal, R. Garg, P. Heidelberger

{"title":"蓝色基因/L超级计算机上全对全通信的优化","authors":"Sameer Kumar, Yogish Sabharwal, R. Garg, P. Heidelberger","doi":"10.1109/ICPP.2008.83","DOIUrl":null,"url":null,"abstract":"All-to-all communication is a well known performance bottleneck for many applications, such as the ones that use the Fast-Fourier-transform (FFT) algorithm. We analyze the performance of all-to-all communication on the BlueGene/L torus interconnect that has link contention even for all-to-all operations with short messages. We observed that the performance of all-to-all depends on the shape of the processor partition. We present a performance analysis of all-to-all on partitions of various shapes. We then present optimization schemes that substantially improve the performance of all-to-all with short and large messages.In particular, throughput improved from 64% to over 99% of peak on the 65,536 (64 times 32 times 32) node Blue Gene/L machine at the Lawrence Livermore National Lab. We show the impact of the all-to-all performance optimizations in 1-D and 3-D FFT benchmarks. We achieved a performance of over 2.8 TF for the HPC Challenge 1D FFT benchmark with our optimized all-to-all.","PeriodicalId":388408,"journal":{"name":"2008 37th International Conference on Parallel Processing","volume":"54 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"70","resultStr":"{\"title\":\"Optimization of All-to-All Communication on the Blue Gene/L Supercomputer\",\"authors\":\"Sameer Kumar, Yogish Sabharwal, R. Garg, P. Heidelberger\",\"doi\":\"10.1109/ICPP.2008.83\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"All-to-all communication is a well known performance bottleneck for many applications, such as the ones that use the Fast-Fourier-transform (FFT) algorithm. We analyze the performance of all-to-all communication on the BlueGene/L torus interconnect that has link contention even for all-to-all operations with short messages. We observed that the performance of all-to-all depends on the shape of the processor partition. We present a performance analysis of all-to-all on partitions of various shapes. We then present optimization schemes that substantially improve the performance of all-to-all with short and large messages.In particular, throughput improved from 64% to over 99% of peak on the 65,536 (64 times 32 times 32) node Blue Gene/L machine at the Lawrence Livermore National Lab. We show the impact of the all-to-all performance optimizations in 1-D and 3-D FFT benchmarks. We achieved a performance of over 2.8 TF for the HPC Challenge 1D FFT benchmark with our optimized all-to-all.\",\"PeriodicalId\":388408,\"journal\":{\"name\":\"2008 37th International Conference on Parallel Processing\",\"volume\":\"54 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2008-09-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"70\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2008 37th International Conference on Parallel Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICPP.2008.83\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 37th International Conference on Parallel Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPP.2008.83","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 70

摘要

对于许多应用程序(例如使用快速傅里叶变换(FFT)算法的应用程序)来说，全对全通信是一个众所周知的性能瓶颈。我们分析了BlueGene/L环面互连中存在链路争用的全对全通信性能，即使是短消息的全对全操作。我们观察到，全对全的性能取决于处理器分区的形状。我们对各种形状的分区进行了全对全的性能分析。然后，我们提出了优化方案，大大提高了短消息和大消息的所有对所有的性能。特别是，在Lawrence Livermore National Lab的65,536(64乘以32乘以32)节点Blue Gene/L机器上，吞吐量从峰值的64%提高到99%以上。我们在1-D和3-D FFT基准测试中展示了所有对所有性能优化的影响。通过优化的全对全，我们在HPC挑战1D FFT基准测试中实现了超过2.8 TF的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Optimization of All-to-All Communication on the Blue Gene/L Supercomputer

All-to-all communication is a well known performance bottleneck for many applications, such as the ones that use the Fast-Fourier-transform (FFT) algorithm. We analyze the performance of all-to-all communication on the BlueGene/L torus interconnect that has link contention even for all-to-all operations with short messages. We observed that the performance of all-to-all depends on the shape of the processor partition. We present a performance analysis of all-to-all on partitions of various shapes. We then present optimization schemes that substantially improve the performance of all-to-all with short and large messages.In particular, throughput improved from 64% to over 99% of peak on the 65,536 (64 times 32 times 32) node Blue Gene/L machine at the Lawrence Livermore National Lab. We show the impact of the all-to-all performance optimizations in 1-D and 3-D FFT benchmarks. We achieved a performance of over 2.8 TF for the HPC Challenge 1D FFT benchmark with our optimized all-to-all.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2008 37th International Conference on Parallel Processing

自引率

0.00%

发文量