High-speed stateful packet classifier based on TSS algorithm optimized for off-chip memories

2021 24th International Symposium on Design and Diagnostics of Electronic Circuits & Systems (DDECS) Pub Date : 2021-04-07 DOI:10.1109/DDECS52668.2021.9417060

Michal Orsák, Tomás Benes

{"title":"High-speed stateful packet classifier based on TSS algorithm optimized for off-chip memories","authors":"Michal Orsák, Tomás Benes","doi":"10.1109/DDECS52668.2021.9417060","DOIUrl":null,"url":null,"abstract":"We present a modular out-of-order architecture for stateful packet classification. The architecture uses DDR4 SDRAM memory to store rules and their state information to support millions of rules. The memory access pattern generated by network traffic significantly degrades the performance of the DDR4. Our architecture contains a cache and aggregation queues to negate this effect. Additionally, the memory subsystem supports a read cancellation and uses an out-of-order pipeline to maximize the main memory’s effectiveness further. The rule set update is implemented as a non-blocking operation and can be interleaved with lookup operations without any performance decrease, leading to the same execution time for rule update and rule lookup. The architecture is optimized for the modern datacenter’s network traffic and a small on-chip memory footprint, making it suitable as an accelerator for the Open vSwitch. As a result, our novel architecture configured with 1 million exact match rules can process traffic up to 202 Gbit/s (300 Mp/s) in average case and 51 Gbit/s (76 Mp/s) in the worst case with the use of a common dual-channel 64 bit DDR4-2666 MHz. It uses fewer FPGA resources (excluding cache memory) than the well-known de facto industry standard Xilinx MIG DDR4 controllers. Our proposed architecture enables commodity FPGA cards commonly equipped with DDR4 to process 100 Gbit/s which results in a significant cost reduction of a 100G SmartNICs.","PeriodicalId":415808,"journal":{"name":"2021 24th International Symposium on Design and Diagnostics of Electronic Circuits & Systems (DDECS)","volume":"59 6","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 24th International Symposium on Design and Diagnostics of Electronic Circuits & Systems (DDECS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DDECS52668.2021.9417060","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

We present a modular out-of-order architecture for stateful packet classification. The architecture uses DDR4 SDRAM memory to store rules and their state information to support millions of rules. The memory access pattern generated by network traffic significantly degrades the performance of the DDR4. Our architecture contains a cache and aggregation queues to negate this effect. Additionally, the memory subsystem supports a read cancellation and uses an out-of-order pipeline to maximize the main memory’s effectiveness further. The rule set update is implemented as a non-blocking operation and can be interleaved with lookup operations without any performance decrease, leading to the same execution time for rule update and rule lookup. The architecture is optimized for the modern datacenter’s network traffic and a small on-chip memory footprint, making it suitable as an accelerator for the Open vSwitch. As a result, our novel architecture configured with 1 million exact match rules can process traffic up to 202 Gbit/s (300 Mp/s) in average case and 51 Gbit/s (76 Mp/s) in the worst case with the use of a common dual-channel 64 bit DDR4-2666 MHz. It uses fewer FPGA resources (excluding cache memory) than the well-known de facto industry standard Xilinx MIG DDR4 controllers. Our proposed architecture enables commodity FPGA cards commonly equipped with DDR4 to process 100 Gbit/s which results in a significant cost reduction of a 100G SmartNICs.

查看原文本刊更多论文

基于片外存储器优化的TSS算法的高速状态分组分类器

提出了一种模块化的无序结构，用于状态包分类。该体系结构使用DDR4 SDRAM存储规则及其状态信息，支持数百万条规则。网络流量产生的内存访问模式会显著降低DDR4的性能。我们的体系结构包含缓存和聚合队列来消除这种影响。此外，内存子系统支持读取消，并使用乱序管道进一步最大化主存的有效性。规则集更新作为非阻塞操作实现，并且可以与查找操作交叉进行而不会降低性能，从而使规则更新和规则查找的执行时间相同。该架构针对现代数据中心的网络流量和较小的片上内存占用进行了优化，使其适合作为Open vSwitch的加速器。因此，我们的新架构配置了100万个精确匹配规则，在使用通用双通道64位DDR4-2666 MHz的情况下，平均情况下可以处理高达202 Gbit/s (300 Mp/s)的流量，在最坏情况下可以处理高达51 Gbit/s (76 Mp/s)的流量。它比众所周知的行业标准Xilinx MIG DDR4控制器使用更少的FPGA资源(不包括缓存内存)。我们提出的架构使通常配备DDR4的商用FPGA卡能够处理100 Gbit/s，从而显着降低100G smartnic的成本。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 24th International Symposium on Design and Diagnostics of Electronic Circuits & Systems (DDECS)

自引率

0.00%

发文量