International Conference on Parallel Processing, 2004. ICPP 2004.最新文献_第3页

Group-based cooperative cache management for mobile clients in a mobile environment 移动环境中基于组的移动客户端协同缓存管理

International Conference on Parallel Processing, 2004. ICPP 2004. Pub Date : 2004-08-15 DOI: 10.1109/ICPP.2004.1327907

Chi-Yin Chow, H. Leong, A. Chan

引用次数: 57

SPAL: a speedy packet lookup technique for high-performance routers SPAL:用于高性能路由器的快速数据包查找技术

International Conference on Parallel Processing, 2004. ICPP 2004. Pub Date : 2004-08-15 DOI: 10.1109/ICPP.2004.1327934

N. Tzeng

{"title":"SPAL: a speedy packet lookup technique for high-performance routers","authors":"N. Tzeng","doi":"10.1109/ICPP.2004.1327934","DOIUrl":"https://doi.org/10.1109/ICPP.2004.1327934","url":null,"abstract":"This work introduces and evaluates a technique for speedy packet lookups, called SPAL, in high-performance routers, realized by fragmenting the BGP routing table into subsets. Such a router contains multiple line cards (LCs), each of which is equipped with a forwarding engine (FE) to perform table lookups locally based on its forwarding table (which is a fragmented subset). The number of table entries in each FE drops as the number of LCs in a router grows. This reduction in the forwarding table size drastically lowers the amount of SRAM (e.g., L3 data cache) required in each LC to hold the trie constructed according to the matching algorithm. SPAL calls for caching the lookup result of a given IP address at its home LC (denoted by LC/sub ho/, using the LR-cache), such that the result can satisfy the lookup requests for the same address from not only LC/sub ho/ but also other LCs quickly, when the switching fabric for interconnecting LCs has a low latency. Lookup results obtained from remote LCs are also held in the LR-cache of a local LC. Our trace-driven simulation reveals that SPAL indeed leads to substantial improvement in mean lookup performance. SPAL may possibly shorten the worst-case lookup time (thanks to fewer memory accesses during longest-prefix matching search) when compared with a current router without partitioning the routing table. It takes no specific traffic into consideration when selecting the partitioning bits, promising good scalability and a small mean lookup time per packet.","PeriodicalId":106240,"journal":{"name":"International Conference on Parallel Processing, 2004. ICPP 2004.","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116240260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Low-cost register-pressure prediction for scalar replacement using pseudo-schedules 使用伪时间表进行标量替换的低成本寄存器压力预测

International Conference on Parallel Processing, 2004. ICPP 2004. Pub Date : 2004-08-15 DOI: 10.1109/ICPP.2004.1327911

Yin Ma, S. Carr, Rong Ge

{"title":"Low-cost register-pressure prediction for scalar replacement using pseudo-schedules","authors":"Yin Ma, S. Carr, Rong Ge","doi":"10.1109/ICPP.2004.1327911","DOIUrl":"https://doi.org/10.1109/ICPP.2004.1327911","url":null,"abstract":"Scalar replacement is an effective optimization for removing memory accesses. However, exposing all possible array reuse with scalars may cause a significant increase in register pressure, resulting in register spilling and performance degradation. We present a low cost method to predict the register pressure of a loop before applying scalar replacement on high-level source code, called pseudo-schedule register prediction (PRP), that takes into account the effects of both software pipelining and register allocation. PRP attempts to eliminate the possibility of degradation from scalar replacement due to register spilling while providing opportunities for a good speedup. PRP uses three approximation algorithms: one for constructing a data dependence graph, one for computing the recurrence constraints of a software pipelined loop, and one for building a pseudo-schedule. Our experiments show that PRP predicts the floating-point register pressure within 2 registers and the integer register pressure within 2.7 registers on average with a time complexity of O(n/sup 2/) in practice. PRP achieves similar performance to the best previous approach, having O(n/sup 3/) complexity, with less than one-fourth of the compilation time on our test suite.","PeriodicalId":106240,"journal":{"name":"International Conference on Parallel Processing, 2004. ICPP 2004.","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114904308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

TAP: a novel tunneling approach for anonymity in structured P2P systems TAP:结构化P2P系统中一种新颖的匿名隧道方法

International Conference on Parallel Processing, 2004. ICPP 2004. Pub Date : 2004-08-15 DOI: 10.1109/ICPP.2004.1327900

Yingwu Zhu, Yimin Hu

引用次数: 25

FIFO based multicast scheduling algorithm for VOQ packet switches 基于FIFO的VOQ分组交换机组播调度算法

International Conference on Parallel Processing, 2004. ICPP 2004. Pub Date : 2004-08-15 DOI: 10.1109/ICPP.2004.1327938

Deng Pan, Yuanyuan Yang

{"title":"FIFO based multicast scheduling algorithm for VOQ packet switches","authors":"Deng Pan, Yuanyuan Yang","doi":"10.1109/ICPP.2004.1327938","DOIUrl":"https://doi.org/10.1109/ICPP.2004.1327938","url":null,"abstract":"Many networking/computing applications require high speed switching for multicast traffic at the switch/router level to save network bandwidth. However, existing queueing based packet switches and scheduling algorithms cannot perform well under multicast traffic. While the speedup requirement makes the output queued switch difficult to scale, the single input queued switch suffers from the head of line (HOL) blocking, which severely limits the network throughput. An efficient yet simple buffering strategy to remove the HOL blocking is to use the virtual output queueing (VOQ), which has been shown to perform well under unicast traffic. However, it is impractical to use the traditional virtual output queued (VOQ) switches for multicast traffic, because a VOQ multicast switch has to maintain an exponential number of queues in each input port. We give a novel queue structure for the input buffers of a VOQ multicast switch by separately storing the address information and data information of a packet, so that an input port only needs to manage a linear number of queues. In conjunction with the multicast VOQ switch, we present a first-in-first-out based multicast scheduling algorithm, FIFO Multicast Scheduling (FIFOMS), and conduct extensive simulations to compare FIFOMS with other popular scheduling algorithms. Our results fully demonstrate the superiority of FIFOMS in both multicast latency and queue space requirement.","PeriodicalId":106240,"journal":{"name":"International Conference on Parallel Processing, 2004. ICPP 2004.","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123062930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Non-uniform dependences partitioned by recurrence chains 由递归链划分的非一致依赖关系

International Conference on Parallel Processing, 2004. ICPP 2004. Pub Date : 2004-08-15 DOI: 10.1109/ICPP.2004.1327909

Y. Yu, E. D'Hollander

引用次数: 7

An effective fault-tolerant routing methodology for direct networks 一种有效的直连网络容错路由方法

International Conference on Parallel Processing, 2004. ICPP 2004. Pub Date : 2004-08-15 DOI: 10.1109/ICPP.2004.1327925

M. E. Gómez, J. Flich, P. López, A. Robles, J. Duato, N. Nordbotten, Olav Lysne, T. Skeie

{"title":"An effective fault-tolerant routing methodology for direct networks","authors":"M. E. Gómez, J. Flich, P. López, A. Robles, J. Duato, N. Nordbotten, Olav Lysne, T. Skeie","doi":"10.1109/ICPP.2004.1327925","DOIUrl":"https://doi.org/10.1109/ICPP.2004.1327925","url":null,"abstract":"Current massively parallel computing systems are being built with thousands of nodes, which significantly affect the probability of failure. M. E. Gomex proposed a methodology to design fault-tolerant routing algorithms for direct interconnection networks. The methodology uses a simple mechanism: for some source-destination pairs, packets are first forwarded to an intermediate node, and later, from this node to the destination node. Minimal adaptive routing is used along both subpaths. For those cases where the methodology cannot find a suitable intermediate node, it combines the use of intermediate nodes with two additional mechanisms: disabling adaptive routing and using misrouting on a per-packet basis. While the combination of these three mechanisms tolerates a large number of faults, each one requires adding some hardware support in the network and also introduces some overhead. In this paper, we perform an in-depth detailed analysis of the impact of these mechanisms on network behaviour. We analyze the impact of the three mechanisms separately and combined. The ultimate goal of this paper is to obtain a suitable combination of mechanisms that is able to meet the trade-off between fault-tolerance degree, routing complexity, and performance.","PeriodicalId":106240,"journal":{"name":"International Conference on Parallel Processing, 2004. ICPP 2004.","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114365821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

Probabilistic real-time guarantees for component-oriented phased array radars 面向组件相控阵雷达的概率实时保证

International Conference on Parallel Processing, 2004. ICPP 2004. Pub Date : 2004-08-15 DOI: 10.1109/ICPP.2004.1327926

Chin-Fu Kuo, Ya-Shu Chen, Tei-Wei Kuo, P. Lin, Cheng Chang

引用次数: 1

Distributed QoS-aware scheduling algorithm for WDM optical interconnects with arbitrary wavelength conversion capability 具有任意波长转换能力的WDM光互连分布式qos感知调度算法

International Conference on Parallel Processing, 2004. ICPP 2004. Pub Date : 2004-08-15 DOI: 10.1109/ICPP.2004.1327936

Zhenghao Zhang, Yuanyuan Yang

引用次数: 1

BUCS - a bottom-up cache structure for networked storage servers BUCS——用于网络存储服务器的自下而上的缓存结构

International Conference on Parallel Processing, 2004. ICPP 2004. Pub Date : 2004-08-15 DOI: 10.1109/ICPP.2004.1327937

Ming Zhang, Qing Yang

{"title":"BUCS - a bottom-up cache structure for networked storage servers","authors":"Ming Zhang, Qing Yang","doi":"10.1109/ICPP.2004.1327937","DOIUrl":"https://doi.org/10.1109/ICPP.2004.1327937","url":null,"abstract":"This paper introduces a new caching structure to improve server performance by minimizing data traffic over the system bus. The idea is to form a bottom-up caching hierarchy in a networked storage server. The bottom level cache is located on an embedded controller that is a combination of a network interface card (NIC) and a storage host bus adapter (HBA). Storage data coming from or going to a network are cached at this bottom level cache and meta-data related to these data are passed to the host for processing. When cached data exceed the capacity of the bottom level cache, some data are moved to the host RAM that is usually larger than the bottom level cache. This new cache hierarchy is referred to as bottom-up cache structure (BUGS) in contrast to a traditional CPU-centric top-down cache where the top-level cache is the smallest and fastest, and the lower in the hierarchy the larger and slower the cache. Such data caching at the controller level dramatically reduces bus traffic and leads to great performance improvement for networked storages. We have implemented a proof-of-concept prototype using Intel's IQ80310 reference board and Linux network block device. Through performance measurements on the prototype implementation, we observed up to 3 times performance improvement of BUCS over traditional systems in terms of response time and system throughput.","PeriodicalId":106240,"journal":{"name":"International Conference on Parallel Processing, 2004. ICPP 2004.","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133298564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2