An Auto-Tuning Method for High-Bandwidth Low-Latency Approximate Interconnection Networks

2023 31st Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP) Pub Date : 2023-03-01 DOI:10.1109/PDP59025.2023.00011

S. Hirasawa, M. Koibuchi

{"title":"An Auto-Tuning Method for High-Bandwidth Low-Latency Approximate Interconnection Networks","authors":"S. Hirasawa, M. Koibuchi","doi":"10.1109/PDP59025.2023.00011","DOIUrl":null,"url":null,"abstract":"Ahstract-The next-generation interconnection networks, such as 400 GbE specification, impose Forwarding Error Correction (FEC) operation, such as RS-FEC (544,514), to incoming packets at every switch. The significant FEC latency increases the end-to-end communication latency that degrades the application performance in parallel computers. To resolve the FEC latency problem, a prior work presented error-prone high-bandwidth low-latency networks that do not perform the FEC operation. They enable high-bandwidth approximate data transfer and low-bandwidth perfect data transfer to support various kinds of parallel applications subject to different levels of probability of bit-flip occurrence. As the number of approximate data transfers increases, the parallel applications can obtain a significant speedup of their execution at the expense of the moderate degraded quality of results (QoRs). However, it is difficult for users to identify whether each communication should be approximate or not, so as to obtain the shortest execution time with enough QoRs for a given parallel application. In this study, we apply an auto-tuning framework for approximate interconnection networks; it automatically identifies whether each communication should be approximate data transfer or not, by attempting thousands executions of a given parallel application. An auto-tuning attempts a large number of program executions by varying the possible communication parameters to find out the best execution configuration of the program. The multiple executions would generate different positions of bit flips on communication data that may provide different qualities of results even if the same parameters are taken. Although this uncertainty introduces difficulties in the optimization of the auto-tuning, many offline trials lead to a high probability of the program's success execution. Evaluation results show that high-performance MPI applications with our auto-tuning method result in 1.30 average performance improvement on error-prone high-performance approximate networks.","PeriodicalId":153500,"journal":{"name":"2023 31st Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 31st Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PDP59025.2023.00011","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Ahstract-The next-generation interconnection networks, such as 400 GbE specification, impose Forwarding Error Correction (FEC) operation, such as RS-FEC (544,514), to incoming packets at every switch. The significant FEC latency increases the end-to-end communication latency that degrades the application performance in parallel computers. To resolve the FEC latency problem, a prior work presented error-prone high-bandwidth low-latency networks that do not perform the FEC operation. They enable high-bandwidth approximate data transfer and low-bandwidth perfect data transfer to support various kinds of parallel applications subject to different levels of probability of bit-flip occurrence. As the number of approximate data transfers increases, the parallel applications can obtain a significant speedup of their execution at the expense of the moderate degraded quality of results (QoRs). However, it is difficult for users to identify whether each communication should be approximate or not, so as to obtain the shortest execution time with enough QoRs for a given parallel application. In this study, we apply an auto-tuning framework for approximate interconnection networks; it automatically identifies whether each communication should be approximate data transfer or not, by attempting thousands executions of a given parallel application. An auto-tuning attempts a large number of program executions by varying the possible communication parameters to find out the best execution configuration of the program. The multiple executions would generate different positions of bit flips on communication data that may provide different qualities of results even if the same parameters are taken. Although this uncertainty introduces difficulties in the optimization of the auto-tuning, many offline trials lead to a high probability of the program's success execution. Evaluation results show that high-performance MPI applications with our auto-tuning method result in 1.30 average performance improvement on error-prone high-performance approximate networks.

查看原文本刊更多论文

高带宽低延迟近似互联网络的自调优方法

摘要下一代互连网络，如400gbe规范，在每台交换机上对入站报文进行转发纠错(Forwarding Error Correction, FEC)操作，如RS-FEC(544,514)。显著的FEC延迟增加了端到端通信延迟，从而降低了并行计算机中的应用程序性能。为了解决FEC延迟问题，先前的工作提出了不执行FEC操作的易出错的高带宽低延迟网络。它们可以实现高带宽近似数据传输和低带宽完美数据传输，以支持不同类型的并行应用程序，这些应用程序受到不同级别的比特翻转发生概率的影响。随着近似数据传输数量的增加，并行应用程序可以以适度降低结果质量(qor)为代价获得显著的执行加速。然而，用户很难确定每次通信是否应该是近似的，以便在给定的并行应用程序中获得具有足够qor的最短执行时间。在本研究中，我们对近似互连网络应用了一个自动调谐框架;它通过尝试对给定的并行应用程序执行数千次，自动识别每次通信是否应该是近似数据传输。自动调优通过改变可能的通信参数来尝试大量的程序执行，以找出程序的最佳执行配置。多次执行将在通信数据上产生不同的位翻转位置，即使采用相同的参数，也可能提供不同质量的结果。尽管这种不确定性给自动调优的优化带来了困难，但许多离线试验导致程序成功执行的高概率。评估结果表明，使用我们的自调优方法的高性能MPI应用程序在容易出错的高性能近似网络上的平均性能提高了1.30。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2023 31st Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)

自引率

0.00%

发文量