RoSR: A Novel Selective Retransmission FPGA Architecture for RDMA NICs

IF 1.4 3区计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Computer Architecture Letters Pub Date : 2025-07-31 DOI:10.1109/LCA.2025.3594110

Mengting Zhang;Zhichuan Guo;Shining Sun

{"title":"RoSR: A Novel Selective Retransmission FPGA Architecture for RDMA NICs","authors":"Mengting Zhang;Zhichuan Guo;Shining Sun","doi":"10.1109/LCA.2025.3594110","DOIUrl":null,"url":null,"abstract":"Remote Direct Memory Access (RDMA) enables low-latency datacenter networks but suffers from inefficient loss recovery using Go-Back-N (GBN). GBN retransmits entire packet windows, degrading Flow Completion Time (FCT) under congestion. We introduce RoSR, a novel selective retransmission architecture for Field-Programmable Gate Array (FPGA)-based RDMA NICs that supports hardware-accelerated direct writes of out-of-order (OoO) packets. RoSR supports efficient OoO packet reception and enables fine-grained retransmission using a dynamic shared bitmap for packet tracking. By extending the RDMA over Converged Ethernet version 2 (RoCEv2) packet format, RoSR facilitates selective retransmission. It triggers retransmissions via timeouts using bitmap blocks and introduces new Nack-bitmap and rd-req-bitmap messages for loss reporting. Under 1% packet loss, RoSR achieves up to 13.5× (RDMA Write) and 15.6× (RDMA Read) higher throughput than Xilinx ERNIC. In NS-3 simulations using the HPCC RDMA stack, RoSR reduces FCT slowdown by 3× to 6× compared to GBN across various packet loss rates, congestion control algorithms (DCQCN, HPCC, Timely), and traffic patterns, while maintaining robustness under high round-trip time (RTT) conditions.","PeriodicalId":51248,"journal":{"name":"IEEE Computer Architecture Letters","volume":"24 2","pages":"269-272"},"PeriodicalIF":1.4000,"publicationDate":"2025-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Computer Architecture Letters","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11106222/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

Abstract

Remote Direct Memory Access (RDMA) enables low-latency datacenter networks but suffers from inefficient loss recovery using Go-Back-N (GBN). GBN retransmits entire packet windows, degrading Flow Completion Time (FCT) under congestion. We introduce RoSR, a novel selective retransmission architecture for Field-Programmable Gate Array (FPGA)-based RDMA NICs that supports hardware-accelerated direct writes of out-of-order (OoO) packets. RoSR supports efficient OoO packet reception and enables fine-grained retransmission using a dynamic shared bitmap for packet tracking. By extending the RDMA over Converged Ethernet version 2 (RoCEv2) packet format, RoSR facilitates selective retransmission. It triggers retransmissions via timeouts using bitmap blocks and introduces new Nack-bitmap and rd-req-bitmap messages for loss reporting. Under 1% packet loss, RoSR achieves up to 13.5× (RDMA Write) and 15.6× (RDMA Read) higher throughput than Xilinx ERNIC. In NS-3 simulations using the HPCC RDMA stack, RoSR reduces FCT slowdown by 3× to 6× compared to GBN across various packet loss rates, congestion control algorithms (DCQCN, HPCC, Timely), and traffic patterns, while maintaining robustness under high round-trip time (RTT) conditions.

查看原文本刊更多论文

一种新的RDMA网卡选择性重传FPGA架构

远程直接内存访问（RDMA）支持低延迟的数据中心网络，但使用Go-Back-N （GBN）时存在效率低下的损失恢复问题。GBN重传整个数据包窗口，降低了拥塞下的流完成时间（FCT）。我们介绍了RoSR，一种新的选择性重传架构，用于基于现场可编程门阵列（FPGA）的RDMA网卡，支持硬件加速的乱序（OoO）数据包的直接写入。RoSR支持有效的OoO数据包接收，并使用动态共享位图实现数据包跟踪的细粒度重传。通过扩展RDMA over Converged Ethernet version 2 （RoCEv2）数据包格式，RoSR促进了选择性重传。它通过使用位图块的超时触发重传，并为丢失报告引入了新的ack-bitmap和rd-req-bitmap消息。在丢包率为1%的情况下，RoSR的吞吐量比Xilinx ERNIC高13.5倍（RDMA Write）和15.6倍（RDMA Read）。在使用HPCC RDMA堆栈的NS-3模拟中，与GBN相比，RoSR在各种丢包率、拥塞控制算法（DCQCN、HPCC、Timely）和流量模式下将FCT减速减少了3到6倍，同时在高往返时间（RTT）条件下保持鲁棒性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Computer Architecture Letters COMPUTER SCIENCE, HARDWARE & ARCHITECTURE-

CiteScore

4.60

自引率

4.30%

发文量

期刊介绍： IEEE Computer Architecture Letters is a rigorously peer-reviewed forum for publishing early, high-impact results in the areas of uni- and multiprocessor computer systems, computer architecture, microarchitecture, workload characterization, performance evaluation and simulation techniques, and power-aware computing. Submissions are welcomed on any topic in computer architecture, especially but not limited to: microprocessor and multiprocessor systems, microarchitecture and ILP processors, workload characterization, performance evaluation and simulation techniques, compiler-hardware and operating system-hardware interactions, interconnect architectures, memory and cache systems, power and thermal issues at the architecture level, I/O architectures and techniques, independent validation of previously published results, analysis of unsuccessful techniques, domain-specific processor architectures (e.g., embedded, graphics, network, etc.), real-time and high-availability architectures, reconfigurable systems.