{"title":"High Performance Messaging on Workstations: Illinois Fast Messages (FM) for Myrinet","authors":"S. Pakin, Mario Lauria, A. Chien","doi":"10.1109/SUPERC.1995.32","DOIUrl":null,"url":null,"abstract":"In most computer systems, software overhead dominates the cost of messaging, reducing delivered performance, especially for short messages. Efficient software messaging layers are needed to deliver the hardware performance to the application level and to support tightly-coupled workstation clusters. Illinois Fast Messages (FM) 1.0 is a high speed messaging layer that delivers low latency and high bandwidth for short messages. For 128-byte packets, FM achieves bandwidths of 16.2MB/s and one-way latencies 32 µs on Myrinet-connected SPARCstations (user-level to user-level). For shorter packets, we have measured one-way latencies of 25 µs, and for larger packets, bandwidth as high as to 19.6MB/s — delivered bandwidth greater than OC-3. FM is also superior to the Myrinet API messaging layer, not just in terms of latency and usable bandwidth, but also in terms of the message half-power point (n_{\\frac{1}{2}}), which is two orders of magnitude smaller (54 vs. 4,409 bytes). We describe the FM messaging primitives and the critical design issues in building a low-latency messaging layers for workstation clusters. Several issues are critical: the division of labor between host and network coprocessor, management of the input/output (I/O) bus, and buffer management. To achieve high performance, messaging layers should assign as much functionality as possible to the host. If the network interface has DMA capability, the I/Obus should be used asymmetrically, with the host processor moving data to the network and exploiting DMA to move data to the host. Finally, buffer management should be extremely simple in the network coprocessor and match queue structures between the network coprocessor and host memory. Detailed measurements show how each of these features contribute to high performance.","PeriodicalId":269909,"journal":{"name":"Proceedings of the IEEE/ACM SC95 Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1995-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"476","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the IEEE/ACM SC95 Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SUPERC.1995.32","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 476
Abstract
In most computer systems, software overhead dominates the cost of messaging, reducing delivered performance, especially for short messages. Efficient software messaging layers are needed to deliver the hardware performance to the application level and to support tightly-coupled workstation clusters. Illinois Fast Messages (FM) 1.0 is a high speed messaging layer that delivers low latency and high bandwidth for short messages. For 128-byte packets, FM achieves bandwidths of 16.2MB/s and one-way latencies 32 µs on Myrinet-connected SPARCstations (user-level to user-level). For shorter packets, we have measured one-way latencies of 25 µs, and for larger packets, bandwidth as high as to 19.6MB/s — delivered bandwidth greater than OC-3. FM is also superior to the Myrinet API messaging layer, not just in terms of latency and usable bandwidth, but also in terms of the message half-power point (n_{\frac{1}{2}}), which is two orders of magnitude smaller (54 vs. 4,409 bytes). We describe the FM messaging primitives and the critical design issues in building a low-latency messaging layers for workstation clusters. Several issues are critical: the division of labor between host and network coprocessor, management of the input/output (I/O) bus, and buffer management. To achieve high performance, messaging layers should assign as much functionality as possible to the host. If the network interface has DMA capability, the I/Obus should be used asymmetrically, with the host processor moving data to the network and exploiting DMA to move data to the host. Finally, buffer management should be extremely simple in the network coprocessor and match queue structures between the network coprocessor and host memory. Detailed measurements show how each of these features contribute to high performance.