High Performance Messaging on Workstations: Illinois Fast Messages (FM) for Myrinet

Proceedings of the IEEE/ACM SC95 Conference Pub Date : 1995-12-08 DOI:10.1109/SUPERC.1995.32

S. Pakin, Mario Lauria, A. Chien

{"title":"High Performance Messaging on Workstations: Illinois Fast Messages (FM) for Myrinet","authors":"S. Pakin, Mario Lauria, A. Chien","doi":"10.1109/SUPERC.1995.32","DOIUrl":null,"url":null,"abstract":"In most computer systems, software overhead dominates the cost of messaging, reducing delivered performance, especially for short messages. Efficient software messaging layers are needed to deliver the hardware performance to the application level and to support tightly-coupled workstation clusters. Illinois Fast Messages (FM) 1.0 is a high speed messaging layer that delivers low latency and high bandwidth for short messages. For 128-byte packets, FM achieves bandwidths of 16.2MB/s and one-way latencies 32 µs on Myrinet-connected SPARCstations (user-level to user-level). For shorter packets, we have measured one-way latencies of 25 µs, and for larger packets, bandwidth as high as to 19.6MB/s — delivered bandwidth greater than OC-3. FM is also superior to the Myrinet API messaging layer, not just in terms of latency and usable bandwidth, but also in terms of the message half-power point (n_{\\frac{1}{2}}), which is two orders of magnitude smaller (54 vs. 4,409 bytes). We describe the FM messaging primitives and the critical design issues in building a low-latency messaging layers for workstation clusters. Several issues are critical: the division of labor between host and network coprocessor, management of the input/output (I/O) bus, and buffer management. To achieve high performance, messaging layers should assign as much functionality as possible to the host. If the network interface has DMA capability, the I/Obus should be used asymmetrically, with the host processor moving data to the network and exploiting DMA to move data to the host. Finally, buffer management should be extremely simple in the network coprocessor and match queue structures between the network coprocessor and host memory. Detailed measurements show how each of these features contribute to high performance.","PeriodicalId":269909,"journal":{"name":"Proceedings of the IEEE/ACM SC95 Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1995-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"476","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the IEEE/ACM SC95 Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SUPERC.1995.32","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 476

Abstract

In most computer systems, software overhead dominates the cost of messaging, reducing delivered performance, especially for short messages. Efficient software messaging layers are needed to deliver the hardware performance to the application level and to support tightly-coupled workstation clusters. Illinois Fast Messages (FM) 1.0 is a high speed messaging layer that delivers low latency and high bandwidth for short messages. For 128-byte packets, FM achieves bandwidths of 16.2MB/s and one-way latencies 32 µs on Myrinet-connected SPARCstations (user-level to user-level). For shorter packets, we have measured one-way latencies of 25 µs, and for larger packets, bandwidth as high as to 19.6MB/s — delivered bandwidth greater than OC-3. FM is also superior to the Myrinet API messaging layer, not just in terms of latency and usable bandwidth, but also in terms of the message half-power point (n_{\frac{1}{2}}), which is two orders of magnitude smaller (54 vs. 4,409 bytes). We describe the FM messaging primitives and the critical design issues in building a low-latency messaging layers for workstation clusters. Several issues are critical: the division of labor between host and network coprocessor, management of the input/output (I/O) bus, and buffer management. To achieve high performance, messaging layers should assign as much functionality as possible to the host. If the network interface has DMA capability, the I/Obus should be used asymmetrically, with the host processor moving data to the network and exploiting DMA to move data to the host. Finally, buffer management should be extremely simple in the network coprocessor and match queue structures between the network coprocessor and host memory. Detailed measurements show how each of these features contribute to high performance.

查看原文本刊更多论文

工作站上的高性能消息传递:用于Myrinet的伊利诺伊快速消息(FM)

在大多数计算机系统中，软件开销主导了消息传递的成本，降低了交付的性能，特别是对于短消息。需要高效的软件消息传递层来将硬件性能交付到应用程序级别并支持紧密耦合的工作站集群。Illinois Fast Messages (FM) 1.0是一种高速消息传递层，可为短消息提供低延迟和高带宽。对于128字节的数据包，FM在myrinet连接的sparcstation(用户级到用户级)上实现了16.2MB/s的带宽和32µs的单向延迟。对于较短的数据包，我们测量了25µs的单向延迟，对于较大的数据包，带宽高达19.6MB/s -交付带宽大于OC-3。FM也优于Myrinet API消息传递层，不仅在延迟和可用带宽方面，而且在消息半功率点(n_{\frac{1}{2}})方面，前者比后者小两个数量级(54字节比4409字节)。我们描述了FM消息传递原语以及为工作站集群构建低延迟消息传递层时的关键设计问题。有几个问题是至关重要的:主机和网络协处理器之间的分工、输入/输出(I/O)总线的管理和缓冲区管理。为了实现高性能，消息传递层应该将尽可能多的功能分配给主机。如果网络接口具有DMA功能，则应该不对称地使用I/Obus，由主机处理器将数据移动到网络，并利用DMA将数据移动到主机。最后，网络协处理器中的缓冲区管理应该非常简单，并匹配网络协处理器和主机内存之间的队列结构。详细的测量显示了这些特性对高性能的贡献。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the IEEE/ACM SC95 Conference

自引率

0.00%

发文量