Symposium on Architectures for Networking and Communications Systems最新文献_第2页

A novel hybrid SRAM/DRAM memory architecture for fast packet buffers 一种用于快速数据包缓冲的新型混合SRAM/DRAM存储器体系结构

Symposium on Architectures for Networking and Communications Systems Pub Date : 2009-10-19 DOI: 10.1145/1882486.1882533

A. Mutter

引用次数: 0

Networking hardware: what drives innovation? 网络硬件:是什么推动了创新?

Symposium on Architectures for Networking and Communications Systems Pub Date : 2009-10-19 DOI: 10.1145/1882486.1882488

Jack Brassil, Jonathan M. Smith, F. Bonomi, K. Bergman, P. Congdon, I. Seskar, S. Muir

引用次数: 0

Revisiting the internet hourglass: core strength vs. middle-age spread 重新审视互联网沙漏:核心力量vs.中年扩张

Symposium on Architectures for Networking and Communications Systems Pub Date : 2009-10-19 DOI: 10.1145/1882486.1882489

Bruce S. Davie

{"title":"Revisiting the internet hourglass: core strength vs. middle-age spread","authors":"Bruce S. Davie","doi":"10.1145/1882486.1882489","DOIUrl":"https://doi.org/10.1145/1882486.1882489","url":null,"abstract":"The threat of commoditization poses a real challenge for service providers. Offering only a \"plain vanilla\" IP packet delivery service limits the options for competitive differentiation. Conversely, embedding additional functionality in the network carries a number of risks -- decreased robustness and increased complexity, for example. The key to addressing this challenge is the careful selection of appropriate functionality to embed in the network. Functions should be added to the network only when they offer value to a wide range of applications, and they should not inhibit the correct operation of applications that do not need them. This talk addresses the question of how novel, useful functions might be embedded \"inside\" the network, and how best to evaluate candidate functions for inclusion.\u0000 For device designers, it is important to understand not only what functions are needed in the network today, but also which ones might provide the most benefit in the future. Because of the uncertainly about exactly what future networks will be expected to do, functions that are selected for inclusion in network devices must be as general as possible, and they should not interfere with the correct operation of the network when they are not needed. Some functions are best implemented as an overlay, leaving the essential network-layer functionality unaffected, while others will need assistance from the fast-path forwarding hardware. We will consider examples of various functions that have been or could be added to \"core\" networks, aiming to understand the tradeoffs both among different functions to add and among different implementation approaches.","PeriodicalId":329300,"journal":{"name":"Symposium on Architectures for Networking and Communications Systems","volume":"190 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122665797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Divide and discriminate: algorithm for deterministic and fast hash lookups Divide and discrimination:用于确定性和快速哈希查找的算法

Symposium on Architectures for Networking and Communications Systems Pub Date : 2009-10-19 DOI: 10.1145/1882486.1882519

D. Ficara, S. Giordano, S. Sushanth Kumar, Bill Lynch

{"title":"Divide and discriminate: algorithm for deterministic and fast hash lookups","authors":"D. Ficara, S. Giordano, S. Sushanth Kumar, Bill Lynch","doi":"10.1145/1882486.1882519","DOIUrl":"https://doi.org/10.1145/1882486.1882519","url":null,"abstract":"Exact and approximate membership lookups are among the most widely used primitives in a number of network applications. Hash tables are commonly used to implement these primitive functions as they provide O(1) operations at moderate load (table occupancy). However, at high load, collisions become prevalent in the table, which makes lookup highly non-deterministic and reduces the average performance. Slow and non-deterministic lookups are detrimental to the performance and scalability of modern platforms such as ASIC/FPGA and multi-core that use highly parallel compute and memory structures.\u0000 To combat non-determinism and achieve high rate lookups, a recent series of papers employ compact on-chip memory that augments the main hash table and stores certain key information. Unfortunately, they require substantial on-chip memory space and bandwidth, and fail to provide 100% guarantee on lookup rate. In this paper, we solve this with a novel construction that requires 10-fold smaller on-chip memory and guarantees that all lookups require a single hash table access at near full load. The on-chip memory uses only between 1- and 2-bit per item and also needs a small number of accesses (between two and four) per lookup. This represents a substantial improvement over previous schemes and therefore can help realize highly scalable and deterministic lookup tables in modern parallel platforms.","PeriodicalId":329300,"journal":{"name":"Symposium on Architectures for Networking and Communications Systems","volume":"1944 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129180424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Design and performance analysis of a DRAM-based statistics counter array architecture 基于dram的统计计数器阵列架构的设计与性能分析

Symposium on Architectures for Networking and Communications Systems Pub Date : 2009-10-19 DOI: 10.1145/1882486.1882512

Haiquan Zhao, Hao Wang, Bill Lin, Jun Xu

{"title":"Design and performance analysis of a DRAM-based statistics counter array architecture","authors":"Haiquan Zhao, Hao Wang, Bill Lin, Jun Xu","doi":"10.1145/1882486.1882512","DOIUrl":"https://doi.org/10.1145/1882486.1882512","url":null,"abstract":"The problem of maintaining efficiently a large number (say millions) of statistics counters that need to be updated at very high speeds (e.g. 40 Gb/s) has received considerable research attention in recent years. This problem arises in a variety of router management and data streaming applications where large arrays of counters are used to track various network statistics and implement various counting sketches. It proves too costly to store such large counter arrays entirely in SRAM while DRAM is viewed as too slow for providing wirespeed updates at such high speeds.\u0000 In this paper, we propose a DRAM-based counter architecture that can effectively maintain wirespeed updates to large counter arrays. The proposed approach is based on the observation that modern commodity DRAM architectures, driven by aggressive performance roadmaps for consumer applications (e.g. video games), have advanced architecture features that can be exploited to make a DRAM-based solution practical. In particular, we propose a randomized DRAM architecture that can harness the performance of modern commodity DRAM offerings by interleaving counter updates to multiple memory banks. The proposed architecture makes use of a simple randomization scheme, a small cache, and small request queues to statistically guarantee a near-perfect load-balancing of counter updates to the DRAM banks. The statistical guarantee of the proposed scheme is proven using a novel combination of convex ordering and large deviation theory. Our proposed counter scheme can support arbitrary increments and decrements at wirespeed, and it can support different number representations, including both integer and floating point number representations.","PeriodicalId":329300,"journal":{"name":"Symposium on Architectures for Networking and Communications Systems","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126537198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 18

Range Tries for scalable address lookup 范围尝试可扩展的地址查找

Symposium on Architectures for Networking and Communications Systems Pub Date : 2009-10-19 DOI: 10.1145/1882486.1882520

I. Sourdis, G. Stefanakis, Ruben de Smet, G. Gaydadjiev

{"title":"Range Tries for scalable address lookup","authors":"I. Sourdis, G. Stefanakis, Ruben de Smet, G. Gaydadjiev","doi":"10.1145/1882486.1882520","DOIUrl":"https://doi.org/10.1145/1882486.1882520","url":null,"abstract":"In this paper we introduce the Range Trie, a new multiway tree data structure for address lookup. Each Range Trie node maps to an address range [Na, Nb) and performs multiple comparisons to determine the subrange an incoming address belongs to. Range Trie improves on the existing Range Trees allowing shorter comparisons than the address width. The maximum comparison length in a Range Trie node is [log2 (Nb -- Na)] bits. Address parts can be shared among multiple concurrent comparisons or even omitted. Addresses can be properly aligned to further reduce the required address bits per comparison. In so doing, Range Tries can store in a single tree node more address bounds to be compared. Given a memory bandwidth, more comparisons are performed in a single step reducing lookup latency, memory accesses per lookup, and overall memory requirements. Latency and memory size scale better than related works as the address width and the number of stored prefixes increase. Considering memory bandwidth of 256-bits per cycle, five to seven Range Trie levels are sufficient to store half a million IPv4 or IPv6 prefixes, while memory size is comparable and in many cases better than linear search. We describe a Range Trie hardware design and evaluate our approach in terms of performance, area cost and power consumption. Range Trie 90-nm ASIC implementations, storing 0.5 million IPv4 and IPv6 prefixes, perform over 500 million lookups per second (OC-3072) and consume 3.9 and 11.4 Watts respectively.","PeriodicalId":329300,"journal":{"name":"Symposium on Architectures for Networking and Communications Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129728562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

Evaluating regular expression matching engines on network and general purpose processors 在网络和通用处理器上评估正则表达式匹配引擎

Symposium on Architectures for Networking and Communications Systems Pub Date : 2009-10-19 DOI: 10.1145/1882486.1882495

M. Becchi, Charlie Wiseman, P. Crowley

引用次数: 64

Design of a scalable nanophotonic interconnect for future multicores 未来多核可扩展纳米光子互连设计

Symposium on Architectures for Networking and Communications Systems Pub Date : 2009-10-19 DOI: 10.1145/1882486.1882516

Avinash Karanth Kodi, R. Morris

{"title":"Design of a scalable nanophotonic interconnect for future multicores","authors":"Avinash Karanth Kodi, R. Morris","doi":"10.1145/1882486.1882516","DOIUrl":"https://doi.org/10.1145/1882486.1882516","url":null,"abstract":"As communication-centric computing paradigm gathers momentum due to increased wire delays and excess power dissipation with technology scaling, researchers have focused their attention on developing alternate technology solutions for Network-on-Chips (NoCs) architectures. One potential solution is nanophotonics because of higher bandwidth, reduced power dissipation and increased wiring simplification. In this paper, we propose PROPEL, a balanced power and area-efficient on-chip photonic interconnect for future multicores. PROPEL overcomes two fundamental issues facing NoCs architectures, namely power dissipation and area overhead, by a combination of multiplexing techniques (wave-length and space) and by exploiting the recent advances in optical component design space. We also propose a scalable version of PROPEL, called E-PROPEL which can scale to 256 cores. Our results indicate that PROPEL and E-PROPEL are power, cost and area-effective networks when compared to competing on-chip optical topologies when the number of optical components and overall power loss in the network are considered. Simulation results on synthetic traffic indicate that PROPEL performs better (throughput and power) than electrical and optical topologies.","PeriodicalId":329300,"journal":{"name":"Symposium on Architectures for Networking and Communications Systems","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133107030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

ISP managed peer-to-peer ISP管理的点对点

Symposium on Architectures for Networking and Communications Systems Pub Date : 2009-10-19 DOI: 10.1145/1882486.1882525

S. James, P. Crowley

引用次数: 7

An ultra high throughput and memory efficient pipeline architecture for multi-match packet classification without TCAMs 一种不使用tcam的超高吞吐量和内存效率的多匹配数据包分类管道体系结构

Symposium on Architectures for Networking and Communications Systems Pub Date : 2009-10-19 DOI: 10.1145/1882486.1882537

Yang Xu, Zhaobo Liu, Zhuoyuan Zhang, H. J. Chao

{"title":"An ultra high throughput and memory efficient pipeline architecture for multi-match packet classification without TCAMs","authors":"Yang Xu, Zhaobo Liu, Zhuoyuan Zhang, H. J. Chao","doi":"10.1145/1882486.1882537","DOIUrl":"https://doi.org/10.1145/1882486.1882537","url":null,"abstract":"The emergence of new network applications like network intrusion detection system, packet-level accounting, and load-balancing requires packet classification to report all matched rules, instead of only the best matched rule. Although several schemes have been proposed recently to address the multi-match packet classification problem, most of them require either huge memory or expensive Ternary Content Addressable Memory (TCAM) to store the intermediate data structure, or suffer from steep performance degradation under certain types of classifiers. In this paper, we decompose the operation of multi-match packet classification from the complicated multi-dimensional search to several single-dimensional searches, and present an asynchronous pipeline architecture based on a signature tree structure to combine the intermediate results returned from single-dimensional searches. By spreading edges of the signature tree in multiple hash tables at different stages of the pipeline, the pipeline can achieve a high throughput via the inter-stage parallel access to hash tables. To exploit further intra-stage parallelism, two edge-grouping algorithms are designed to evenly divide the edges associated with each stage into multiple work-conserving hash tables with minimum overhead. Extensive simulation using realistic classifiers and traffic traces shows that the proposed pipeline architecture outperforms HyperCut and B2PC schemes in classification speed by at least one order of magnitude, while with a similar storage requirement. Particularly, with different types of classifiers of 4K rules, the proposed pipeline architecture is able to achieve a throughput between 19.5 Gbps and 91 Gbps.","PeriodicalId":329300,"journal":{"name":"Symposium on Architectures for Networking and Communications Systems","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115329721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5