{"title":"A novel hybrid SRAM/DRAM memory architecture for fast packet buffers","authors":"A. Mutter","doi":"10.1145/1882486.1882533","DOIUrl":"https://doi.org/10.1145/1882486.1882533","url":null,"abstract":"This paper addresses the design of fast packet buffers for high speed Internet routers and switches. These buffers usually use a memory hierarchy that consist of expensive but fast SRAM and cheap but slow DRAM to meet both speed and capacity requirements. One challenge building these packet buffers is to provide worst-case bandwidth guarantees and fixed latencies, not to stall pipelines or to reduce throughput. My colleagues and I propose a novel packet buffer architecture along with a new memory management algorithm which reduces the amount of required SRAM compared to other architectures, e. g. by 73% for a 100 Gbps system using DDR3-DRAM. Furthermore, our architecture scales well with line rate.","PeriodicalId":329300,"journal":{"name":"Symposium on Architectures for Networking and Communications Systems","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126494564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jack Brassil, Jonathan M. Smith, F. Bonomi, K. Bergman, P. Congdon, I. Seskar, S. Muir
{"title":"Networking hardware: what drives innovation?","authors":"Jack Brassil, Jonathan M. Smith, F. Bonomi, K. Bergman, P. Congdon, I. Seskar, S. Muir","doi":"10.1145/1882486.1882488","DOIUrl":"https://doi.org/10.1145/1882486.1882488","url":null,"abstract":"","PeriodicalId":329300,"journal":{"name":"Symposium on Architectures for Networking and Communications Systems","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133834860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Revisiting the internet hourglass: core strength vs. middle-age spread","authors":"Bruce S. Davie","doi":"10.1145/1882486.1882489","DOIUrl":"https://doi.org/10.1145/1882486.1882489","url":null,"abstract":"The threat of commoditization poses a real challenge for service providers. Offering only a \"plain vanilla\" IP packet delivery service limits the options for competitive differentiation. Conversely, embedding additional functionality in the network carries a number of risks -- decreased robustness and increased complexity, for example. The key to addressing this challenge is the careful selection of appropriate functionality to embed in the network. Functions should be added to the network only when they offer value to a wide range of applications, and they should not inhibit the correct operation of applications that do not need them. This talk addresses the question of how novel, useful functions might be embedded \"inside\" the network, and how best to evaluate candidate functions for inclusion.\u0000 For device designers, it is important to understand not only what functions are needed in the network today, but also which ones might provide the most benefit in the future. Because of the uncertainly about exactly what future networks will be expected to do, functions that are selected for inclusion in network devices must be as general as possible, and they should not interfere with the correct operation of the network when they are not needed. Some functions are best implemented as an overlay, leaving the essential network-layer functionality unaffected, while others will need assistance from the fast-path forwarding hardware. We will consider examples of various functions that have been or could be added to \"core\" networks, aiming to understand the tradeoffs both among different functions to add and among different implementation approaches.","PeriodicalId":329300,"journal":{"name":"Symposium on Architectures for Networking and Communications Systems","volume":"190 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122665797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D. Ficara, S. Giordano, S. Sushanth Kumar, Bill Lynch
{"title":"Divide and discriminate: algorithm for deterministic and fast hash lookups","authors":"D. Ficara, S. Giordano, S. Sushanth Kumar, Bill Lynch","doi":"10.1145/1882486.1882519","DOIUrl":"https://doi.org/10.1145/1882486.1882519","url":null,"abstract":"Exact and approximate membership lookups are among the most widely used primitives in a number of network applications. Hash tables are commonly used to implement these primitive functions as they provide O(1) operations at moderate load (table occupancy). However, at high load, collisions become prevalent in the table, which makes lookup highly non-deterministic and reduces the average performance. Slow and non-deterministic lookups are detrimental to the performance and scalability of modern platforms such as ASIC/FPGA and multi-core that use highly parallel compute and memory structures.\u0000 To combat non-determinism and achieve high rate lookups, a recent series of papers employ compact on-chip memory that augments the main hash table and stores certain key information. Unfortunately, they require substantial on-chip memory space and bandwidth, and fail to provide 100% guarantee on lookup rate. In this paper, we solve this with a novel construction that requires 10-fold smaller on-chip memory and guarantees that all lookups require a single hash table access at near full load. The on-chip memory uses only between 1- and 2-bit per item and also needs a small number of accesses (between two and four) per lookup. This represents a substantial improvement over previous schemes and therefore can help realize highly scalable and deterministic lookup tables in modern parallel platforms.","PeriodicalId":329300,"journal":{"name":"Symposium on Architectures for Networking and Communications Systems","volume":"1944 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129180424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Design and performance analysis of a DRAM-based statistics counter array architecture","authors":"Haiquan Zhao, Hao Wang, Bill Lin, Jun Xu","doi":"10.1145/1882486.1882512","DOIUrl":"https://doi.org/10.1145/1882486.1882512","url":null,"abstract":"The problem of maintaining efficiently a large number (say millions) of statistics counters that need to be updated at very high speeds (e.g. 40 Gb/s) has received considerable research attention in recent years. This problem arises in a variety of router management and data streaming applications where large arrays of counters are used to track various network statistics and implement various counting sketches. It proves too costly to store such large counter arrays entirely in SRAM while DRAM is viewed as too slow for providing wirespeed updates at such high speeds.\u0000 In this paper, we propose a DRAM-based counter architecture that can effectively maintain wirespeed updates to large counter arrays. The proposed approach is based on the observation that modern commodity DRAM architectures, driven by aggressive performance roadmaps for consumer applications (e.g. video games), have advanced architecture features that can be exploited to make a DRAM-based solution practical. In particular, we propose a randomized DRAM architecture that can harness the performance of modern commodity DRAM offerings by interleaving counter updates to multiple memory banks. The proposed architecture makes use of a simple randomization scheme, a small cache, and small request queues to statistically guarantee a near-perfect load-balancing of counter updates to the DRAM banks. The statistical guarantee of the proposed scheme is proven using a novel combination of convex ordering and large deviation theory. Our proposed counter scheme can support arbitrary increments and decrements at wirespeed, and it can support different number representations, including both integer and floating point number representations.","PeriodicalId":329300,"journal":{"name":"Symposium on Architectures for Networking and Communications Systems","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126537198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
I. Sourdis, G. Stefanakis, Ruben de Smet, G. Gaydadjiev
{"title":"Range Tries for scalable address lookup","authors":"I. Sourdis, G. Stefanakis, Ruben de Smet, G. Gaydadjiev","doi":"10.1145/1882486.1882520","DOIUrl":"https://doi.org/10.1145/1882486.1882520","url":null,"abstract":"In this paper we introduce the Range Trie, a new multiway tree data structure for address lookup. Each Range Trie node maps to an address range [Na, Nb) and performs multiple comparisons to determine the subrange an incoming address belongs to. Range Trie improves on the existing Range Trees allowing shorter comparisons than the address width. The maximum comparison length in a Range Trie node is [log2 (Nb -- Na)] bits. Address parts can be shared among multiple concurrent comparisons or even omitted. Addresses can be properly aligned to further reduce the required address bits per comparison. In so doing, Range Tries can store in a single tree node more address bounds to be compared. Given a memory bandwidth, more comparisons are performed in a single step reducing lookup latency, memory accesses per lookup, and overall memory requirements. Latency and memory size scale better than related works as the address width and the number of stored prefixes increase. Considering memory bandwidth of 256-bits per cycle, five to seven Range Trie levels are sufficient to store half a million IPv4 or IPv6 prefixes, while memory size is comparable and in many cases better than linear search. We describe a Range Trie hardware design and evaluate our approach in terms of performance, area cost and power consumption. Range Trie 90-nm ASIC implementations, storing 0.5 million IPv4 and IPv6 prefixes, perform over 500 million lookups per second (OC-3072) and consume 3.9 and 11.4 Watts respectively.","PeriodicalId":329300,"journal":{"name":"Symposium on Architectures for Networking and Communications Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129728562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Evaluating regular expression matching engines on network and general purpose processors","authors":"M. Becchi, Charlie Wiseman, P. Crowley","doi":"10.1145/1882486.1882495","DOIUrl":"https://doi.org/10.1145/1882486.1882495","url":null,"abstract":"In recent years we have witnessed a proliferation of data structure and algorithm proposals for efficient deep packet inspection on memory based architectures. In parallel, we have observed an increasing interest in network processors as target architectures for high performance networking applications.\u0000 In this paper we explore design alternatives in the implementation of regular expression matching architectures on network processors (NPs) and general purpose processors (GPPs). Specifically, we present a performance evaluation on an Intel IXP2800 NP, on an Intel Xeon GPP and on a multiprocessor system consisting of four AMD Opteron 850 cores. Our study shows how to exploit the Intel IXP2800 architectural features in order to maximize system throughput, identifies and evaluates algorithmic and architectural trade-offs and limitations, and highlights how the presence of caches affects the overall performances. We provide an implementation of our NP designs within the Open Network Laboratory (http://www.onl.wustl.edu).","PeriodicalId":329300,"journal":{"name":"Symposium on Architectures for Networking and Communications Systems","volume":"6 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134078063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Design of a scalable nanophotonic interconnect for future multicores","authors":"Avinash Karanth Kodi, R. Morris","doi":"10.1145/1882486.1882516","DOIUrl":"https://doi.org/10.1145/1882486.1882516","url":null,"abstract":"As communication-centric computing paradigm gathers momentum due to increased wire delays and excess power dissipation with technology scaling, researchers have focused their attention on developing alternate technology solutions for Network-on-Chips (NoCs) architectures. One potential solution is nanophotonics because of higher bandwidth, reduced power dissipation and increased wiring simplification. In this paper, we propose PROPEL, a balanced power and area-efficient on-chip photonic interconnect for future multicores. PROPEL overcomes two fundamental issues facing NoCs architectures, namely power dissipation and area overhead, by a combination of multiplexing techniques (wave-length and space) and by exploiting the recent advances in optical component design space. We also propose a scalable version of PROPEL, called E-PROPEL which can scale to 256 cores. Our results indicate that PROPEL and E-PROPEL are power, cost and area-effective networks when compared to competing on-chip optical topologies when the number of optical components and overall power loss in the network are considered. Simulation results on synthetic traffic indicate that PROPEL performs better (throughput and power) than electrical and optical topologies.","PeriodicalId":329300,"journal":{"name":"Symposium on Architectures for Networking and Communications Systems","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133107030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ISP managed peer-to-peer","authors":"S. James, P. Crowley","doi":"10.1145/1882486.1882525","DOIUrl":"https://doi.org/10.1145/1882486.1882525","url":null,"abstract":"Despite their widespread popularity, peer-to-peer (P2P) systems engender continuing controversy. To reduce P2P's high network cost, Internet Service Providers (ISPs) have installed network devices that detect and block P2P traffic. These devices angered subscribers because they also increased download times. For that reason, application developers have begun obfuscating their traffic to avoid ISP-detection. This \"cat and mouse\" game portends a broader shift. If ISPs remedy the relationship with P2P developers now, developers may cooperate with them to develop network-efficient protocols in the future. Our proposal makes a noteworthy contribution in this direction.","PeriodicalId":329300,"journal":{"name":"Symposium on Architectures for Networking and Communications Systems","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114814597","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An ultra high throughput and memory efficient pipeline architecture for multi-match packet classification without TCAMs","authors":"Yang Xu, Zhaobo Liu, Zhuoyuan Zhang, H. J. Chao","doi":"10.1145/1882486.1882537","DOIUrl":"https://doi.org/10.1145/1882486.1882537","url":null,"abstract":"The emergence of new network applications like network intrusion detection system, packet-level accounting, and load-balancing requires packet classification to report all matched rules, instead of only the best matched rule. Although several schemes have been proposed recently to address the multi-match packet classification problem, most of them require either huge memory or expensive Ternary Content Addressable Memory (TCAM) to store the intermediate data structure, or suffer from steep performance degradation under certain types of classifiers. In this paper, we decompose the operation of multi-match packet classification from the complicated multi-dimensional search to several single-dimensional searches, and present an asynchronous pipeline architecture based on a signature tree structure to combine the intermediate results returned from single-dimensional searches. By spreading edges of the signature tree in multiple hash tables at different stages of the pipeline, the pipeline can achieve a high throughput via the inter-stage parallel access to hash tables. To exploit further intra-stage parallelism, two edge-grouping algorithms are designed to evenly divide the edges associated with each stage into multiple work-conserving hash tables with minimum overhead. Extensive simulation using realistic classifiers and traffic traces shows that the proposed pipeline architecture outperforms HyperCut and B2PC schemes in classification speed by at least one order of magnitude, while with a similar storage requirement. Particularly, with different types of classifiers of 4K rules, the proposed pipeline architecture is able to achieve a throughput between 19.5 Gbps and 91 Gbps.","PeriodicalId":329300,"journal":{"name":"Symposium on Architectures for Networking and Communications Systems","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115329721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}