Daniel Rozhko, Geoffrey Elliott, D. Ly-Ma, P. Chow, H. Jacobsen
{"title":"基于HMC内存的fpga数据包匹配:迈向一百万规则","authors":"Daniel Rozhko, Geoffrey Elliott, D. Ly-Ma, P. Chow, H. Jacobsen","doi":"10.1145/3020078.3021752","DOIUrl":null,"url":null,"abstract":"Packet processing systems increasingly need larger rulesets to satisfy the needs of deep-network intrusion prevention and cluster computing. FPGA-based implementations of packet processing systems have been proposed but their use of on-chip memory limits the number of rules these existing systems can maintain. Off-chip memories have traditionally been too slow to enable meaningful processing rates, but in this work we present a packet processing system that utilizes the much faster Hybrid Memory Cube (HMC) technology, enabling larger rulesets at usable line-rates. The proposed architecture streams rules from the HMC memory to a packet matching engine, using prefetching to hide the HMC access latency. The packet matching engine is replicated to process multiple packets in parallel. The final system, implemented on a Xilinx Kintex Ultrascale 060, processes 160 packets in parallel, achieving a 10~Gbps line-rate with approximately 1500 rules and a 16~Mbps line-rate with 1M rules. To the best of our knowledge, this is the first hardware solution capable of maintaining rulesets of this size. We present this work as an exploration of the application of HMCs to packet processing and as a first step in achieving a processing capability of a million rules at usable line-rates.","PeriodicalId":252039,"journal":{"name":"Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"Packet Matching on FPGAs Using HMC Memory: Towards One Million Rules\",\"authors\":\"Daniel Rozhko, Geoffrey Elliott, D. Ly-Ma, P. Chow, H. Jacobsen\",\"doi\":\"10.1145/3020078.3021752\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Packet processing systems increasingly need larger rulesets to satisfy the needs of deep-network intrusion prevention and cluster computing. FPGA-based implementations of packet processing systems have been proposed but their use of on-chip memory limits the number of rules these existing systems can maintain. Off-chip memories have traditionally been too slow to enable meaningful processing rates, but in this work we present a packet processing system that utilizes the much faster Hybrid Memory Cube (HMC) technology, enabling larger rulesets at usable line-rates. The proposed architecture streams rules from the HMC memory to a packet matching engine, using prefetching to hide the HMC access latency. The packet matching engine is replicated to process multiple packets in parallel. The final system, implemented on a Xilinx Kintex Ultrascale 060, processes 160 packets in parallel, achieving a 10~Gbps line-rate with approximately 1500 rules and a 16~Mbps line-rate with 1M rules. To the best of our knowledge, this is the first hardware solution capable of maintaining rulesets of this size. We present this work as an exploration of the application of HMCs to packet processing and as a first step in achieving a processing capability of a million rules at usable line-rates.\",\"PeriodicalId\":252039,\"journal\":{\"name\":\"Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays\",\"volume\":\"32 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-02-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3020078.3021752\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3020078.3021752","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Packet Matching on FPGAs Using HMC Memory: Towards One Million Rules
Packet processing systems increasingly need larger rulesets to satisfy the needs of deep-network intrusion prevention and cluster computing. FPGA-based implementations of packet processing systems have been proposed but their use of on-chip memory limits the number of rules these existing systems can maintain. Off-chip memories have traditionally been too slow to enable meaningful processing rates, but in this work we present a packet processing system that utilizes the much faster Hybrid Memory Cube (HMC) technology, enabling larger rulesets at usable line-rates. The proposed architecture streams rules from the HMC memory to a packet matching engine, using prefetching to hide the HMC access latency. The packet matching engine is replicated to process multiple packets in parallel. The final system, implemented on a Xilinx Kintex Ultrascale 060, processes 160 packets in parallel, achieving a 10~Gbps line-rate with approximately 1500 rules and a 16~Mbps line-rate with 1M rules. To the best of our knowledge, this is the first hardware solution capable of maintaining rulesets of this size. We present this work as an exploration of the application of HMCs to packet processing and as a first step in achieving a processing capability of a million rules at usable line-rates.