{"title":"FPGA上可扩展IP查找引擎的架构和性能模型","authors":"Y. Yang, Yun Qu, Swapnil Haria, V. Prasanna","doi":"10.1109/HPSR.2013.6602306","DOIUrl":null,"url":null,"abstract":"We propose a unified methodology for optimizing IPv4 and IPv6 lookup engines based on the balanced range tree (BRTree) architecture on FPGA. A general BRTree-based IP lookup solution features one or more linear pipelines with a large and complex design space. To allow fast exploration of the design space, we develop a concise set of performance models to characterize the tradeoffs among throughput, table size, lookup latency, and resource requirement of the IP lookup engine. In particular, a simple but realistic model of DDR3 memory is used to accurately estimate the off-chip memory performance. The models are then utilized by the proposed methodology to optimize for high lookup rates, large prefix tables, and a fixed maximum lookup latency, respectively. In our prototyping scenarios, a state-of-the-art FPGA could support (1) up to 24 M IPv6 prefixes with 400 Mlps (million lookups per second); (2) up to 1.6 Blps (billion lookups per second) with 1.1 M IPv4 prefixes; and (3) up to 554 K IPv4 prefixes and 400 Mlps with a lookup latency bounded in 400 ns. All our designs achieve 5.6x - 70x the energy efficiency of TCAM, and have performance independent of the prefix distribution.","PeriodicalId":220418,"journal":{"name":"2013 IEEE 14th International Conference on High Performance Switching and Routing (HPSR)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Architecture and performance models for scalable IP lookup engines on FPGA\",\"authors\":\"Y. Yang, Yun Qu, Swapnil Haria, V. Prasanna\",\"doi\":\"10.1109/HPSR.2013.6602306\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We propose a unified methodology for optimizing IPv4 and IPv6 lookup engines based on the balanced range tree (BRTree) architecture on FPGA. A general BRTree-based IP lookup solution features one or more linear pipelines with a large and complex design space. To allow fast exploration of the design space, we develop a concise set of performance models to characterize the tradeoffs among throughput, table size, lookup latency, and resource requirement of the IP lookup engine. In particular, a simple but realistic model of DDR3 memory is used to accurately estimate the off-chip memory performance. The models are then utilized by the proposed methodology to optimize for high lookup rates, large prefix tables, and a fixed maximum lookup latency, respectively. In our prototyping scenarios, a state-of-the-art FPGA could support (1) up to 24 M IPv6 prefixes with 400 Mlps (million lookups per second); (2) up to 1.6 Blps (billion lookups per second) with 1.1 M IPv4 prefixes; and (3) up to 554 K IPv4 prefixes and 400 Mlps with a lookup latency bounded in 400 ns. All our designs achieve 5.6x - 70x the energy efficiency of TCAM, and have performance independent of the prefix distribution.\",\"PeriodicalId\":220418,\"journal\":{\"name\":\"2013 IEEE 14th International Conference on High Performance Switching and Routing (HPSR)\",\"volume\":\"43 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-07-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 IEEE 14th International Conference on High Performance Switching and Routing (HPSR)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HPSR.2013.6602306\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE 14th International Conference on High Performance Switching and Routing (HPSR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPSR.2013.6602306","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
摘要
我们提出了一种基于FPGA平衡范围树(BRTree)架构的统一方法来优化IPv4和IPv6查找引擎。一般基于brtree的IP查找解决方案具有一个或多个具有大型复杂设计空间的线性管道。为了允许快速探索设计空间,我们开发了一组简洁的性能模型来描述吞吐量、表大小、查找延迟和IP查找引擎的资源需求之间的权衡。特别地,一个简单而真实的DDR3存储器模型被用来准确地估计片外存储器的性能。然后,建议的方法利用这些模型分别针对高查找率、大前缀表和固定的最大查找延迟进行优化。在我们的原型场景中,最先进的FPGA可以支持(1)最多24 M IPv6前缀,速度为400 Mlps(每秒百万次查找);(2)高达1.6 Blps(每秒十亿次查找),使用1.1 M IPv4前缀;(3)最多554 K IPv4前缀和400 Mlps,查找延迟限制在400 ns。我们所有的设计都达到了TCAM的5.6 - 70倍的能源效率,并且具有独立于前缀分布的性能。
Architecture and performance models for scalable IP lookup engines on FPGA
We propose a unified methodology for optimizing IPv4 and IPv6 lookup engines based on the balanced range tree (BRTree) architecture on FPGA. A general BRTree-based IP lookup solution features one or more linear pipelines with a large and complex design space. To allow fast exploration of the design space, we develop a concise set of performance models to characterize the tradeoffs among throughput, table size, lookup latency, and resource requirement of the IP lookup engine. In particular, a simple but realistic model of DDR3 memory is used to accurately estimate the off-chip memory performance. The models are then utilized by the proposed methodology to optimize for high lookup rates, large prefix tables, and a fixed maximum lookup latency, respectively. In our prototyping scenarios, a state-of-the-art FPGA could support (1) up to 24 M IPv6 prefixes with 400 Mlps (million lookups per second); (2) up to 1.6 Blps (billion lookups per second) with 1.1 M IPv4 prefixes; and (3) up to 554 K IPv4 prefixes and 400 Mlps with a lookup latency bounded in 400 ns. All our designs achieve 5.6x - 70x the energy efficiency of TCAM, and have performance independent of the prefix distribution.