Motohiko Matsuda, T. Kudoh, Yuetsu Kodama, Ryousei Takano, Y. Ishikawa
{"title":"TCP Adaptation for MPI on Long-and-Fat Networks","authors":"Motohiko Matsuda, T. Kudoh, Yuetsu Kodama, Ryousei Takano, Y. Ishikawa","doi":"10.1109/CLUSTR.2005.347034","DOIUrl":null,"url":null,"abstract":"Typical MPI applications work in phases of computation and communication, and messages are exchanged in relatively small chunks. This behavior is not optimal for TCP because TCP is designed only to handle a contiguous flow of messages efficiently. This behavior anomaly is well-known, but fixes are not integrated into today's TCP implementations, even though performance is seriously degraded, especially for MPI applications. This paper proposes three improvements in the Linux TCP stack: i.e., pacing at start-up, reducing Retransmit-Timeout time, and TCP parameter switching at the transition of computation phases in an MPI application. Evaluation of these improvements using the NAS parallel benchmarks shows that the BT, CG, IS, and SP benchmarks achieved 10 to 30 percent improvements. On the other hand, the FT and MG benchmarks showed no improvement because they have the steady communication that TCP assumes, and the LU benchmark became slightly worse because it has very little communication","PeriodicalId":255312,"journal":{"name":"2005 IEEE International Conference on Cluster Computing","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2005-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2005 IEEE International Conference on Cluster Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CLUSTR.2005.347034","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 20
Abstract
Typical MPI applications work in phases of computation and communication, and messages are exchanged in relatively small chunks. This behavior is not optimal for TCP because TCP is designed only to handle a contiguous flow of messages efficiently. This behavior anomaly is well-known, but fixes are not integrated into today's TCP implementations, even though performance is seriously degraded, especially for MPI applications. This paper proposes three improvements in the Linux TCP stack: i.e., pacing at start-up, reducing Retransmit-Timeout time, and TCP parameter switching at the transition of computation phases in an MPI application. Evaluation of these improvements using the NAS parallel benchmarks shows that the BT, CG, IS, and SP benchmarks achieved 10 to 30 percent improvements. On the other hand, the FT and MG benchmarks showed no improvement because they have the steady communication that TCP assumes, and the LU benchmark became slightly worse because it has very little communication