{"title":"An Auto-Tuning Method for High-Bandwidth Low-Latency Approximate Interconnection Networks","authors":"S. Hirasawa, M. Koibuchi","doi":"10.1109/PDP59025.2023.00011","DOIUrl":null,"url":null,"abstract":"Ahstract-The next-generation interconnection networks, such as 400 GbE specification, impose Forwarding Error Correction (FEC) operation, such as RS-FEC (544,514), to incoming packets at every switch. The significant FEC latency increases the end-to-end communication latency that degrades the application performance in parallel computers. To resolve the FEC latency problem, a prior work presented error-prone high-bandwidth low-latency networks that do not perform the FEC operation. They enable high-bandwidth approximate data transfer and low-bandwidth perfect data transfer to support various kinds of parallel applications subject to different levels of probability of bit-flip occurrence. As the number of approximate data transfers increases, the parallel applications can obtain a significant speedup of their execution at the expense of the moderate degraded quality of results (QoRs). However, it is difficult for users to identify whether each communication should be approximate or not, so as to obtain the shortest execution time with enough QoRs for a given parallel application. In this study, we apply an auto-tuning framework for approximate interconnection networks; it automatically identifies whether each communication should be approximate data transfer or not, by attempting thousands executions of a given parallel application. An auto-tuning attempts a large number of program executions by varying the possible communication parameters to find out the best execution configuration of the program. The multiple executions would generate different positions of bit flips on communication data that may provide different qualities of results even if the same parameters are taken. Although this uncertainty introduces difficulties in the optimization of the auto-tuning, many offline trials lead to a high probability of the program's success execution. Evaluation results show that high-performance MPI applications with our auto-tuning method result in 1.30 average performance improvement on error-prone high-performance approximate networks.","PeriodicalId":153500,"journal":{"name":"2023 31st Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 31st Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PDP59025.2023.00011","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Ahstract-The next-generation interconnection networks, such as 400 GbE specification, impose Forwarding Error Correction (FEC) operation, such as RS-FEC (544,514), to incoming packets at every switch. The significant FEC latency increases the end-to-end communication latency that degrades the application performance in parallel computers. To resolve the FEC latency problem, a prior work presented error-prone high-bandwidth low-latency networks that do not perform the FEC operation. They enable high-bandwidth approximate data transfer and low-bandwidth perfect data transfer to support various kinds of parallel applications subject to different levels of probability of bit-flip occurrence. As the number of approximate data transfers increases, the parallel applications can obtain a significant speedup of their execution at the expense of the moderate degraded quality of results (QoRs). However, it is difficult for users to identify whether each communication should be approximate or not, so as to obtain the shortest execution time with enough QoRs for a given parallel application. In this study, we apply an auto-tuning framework for approximate interconnection networks; it automatically identifies whether each communication should be approximate data transfer or not, by attempting thousands executions of a given parallel application. An auto-tuning attempts a large number of program executions by varying the possible communication parameters to find out the best execution configuration of the program. The multiple executions would generate different positions of bit flips on communication data that may provide different qualities of results even if the same parameters are taken. Although this uncertainty introduces difficulties in the optimization of the auto-tuning, many offline trials lead to a high probability of the program's success execution. Evaluation results show that high-performance MPI applications with our auto-tuning method result in 1.30 average performance improvement on error-prone high-performance approximate networks.