{"title":"Accelerating List Management for MPI","authors":"K. Underwood, Arun Rodrigues, K. S. Hemmeit","doi":"10.1109/CLUSTR.2005.347036","DOIUrl":null,"url":null,"abstract":"The latency and throughput of MPI messages are critically important to a range of parallel scientific applications. In many modern networks, both of these performance characteristics are largely driven by the performance of a processor on the network interface. Because of the semantics of MPI, this embedded processor is forced to traverse a linked list of posted receives each time a messages is received. As this list grows long, the latency of message reception grows and the throughput of MPI messages decreases. This paper presents a novel hardware feature to handle list management functions on a network interface. By moving functions such as list insertion, list traversal, and list deletion to the hardware unit, latencies are decreased by up to 20% in the zero length queue case with dramatic improvements in the presence of long queues. Similarly, the throughput is increased by up to 10% in the zero length queue case and by nearly 100% in the presence queues of 30 messages","PeriodicalId":255312,"journal":{"name":"2005 IEEE International Conference on Cluster Computing","volume":"339 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2005-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2005 IEEE International Conference on Cluster Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CLUSTR.2005.347036","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Abstract
The latency and throughput of MPI messages are critically important to a range of parallel scientific applications. In many modern networks, both of these performance characteristics are largely driven by the performance of a processor on the network interface. Because of the semantics of MPI, this embedded processor is forced to traverse a linked list of posted receives each time a messages is received. As this list grows long, the latency of message reception grows and the throughput of MPI messages decreases. This paper presents a novel hardware feature to handle list management functions on a network interface. By moving functions such as list insertion, list traversal, and list deletion to the hardware unit, latencies are decreased by up to 20% in the zero length queue case with dramatic improvements in the presence of long queues. Similarly, the throughput is increased by up to 10% in the zero length queue case and by nearly 100% in the presence queues of 30 messages