International Conference on Parallel Processing, 2004. ICPP 2004.最新文献_第6页

An efficient deadlock-free tree-based routing algorithm for irregular wormhole-routed networks based on the turn model 基于回合模型的不规则虫洞网络无死锁树路由算法

International Conference on Parallel Processing, 2004. ICPP 2004. Pub Date : 2004-08-15 DOI: 10.1109/ICPP.2004.1327941

Yau-Ming Sun, Chih-Hsueh Yang, Yeh-Ching Chung, Tai-Yi Huang

{"title":"An efficient deadlock-free tree-based routing algorithm for irregular wormhole-routed networks based on the turn model","authors":"Yau-Ming Sun, Chih-Hsueh Yang, Yeh-Ching Chung, Tai-Yi Huang","doi":"10.1109/ICPP.2004.1327941","DOIUrl":"https://doi.org/10.1109/ICPP.2004.1327941","url":null,"abstract":"We proposed an efficient deadlock-free tree-based routing algorithm, the DOWN/UP routing, for irregular wormhole-routed networks based on the turn model. In a tree-based routing algorithm, hot spots around the root of a spanning tree and the uneven traffic distribution are the two main facts degrade the performance of the routing algorithm. To solve the hot spot and the uneven traffic distribution problems, in the DOWN/UP routing, it tries to push the traffic downward to the leaves of a spanning tree as much as possible and remove prohibited turn pairs with opposite directions in each node, respectively. To evaluate the performance of DOWN/UP routing, the simulation is conducted. We have implemented the DOWN/UP routing along with the L-turn routing on the IRFlexSim0.5 simulator. Irregular networks that contain 128 switches with 4-port and 8-port configurations are simulated. The simulation results show that the proposed routing algorithm outperforms the L-turn routing for all test samples in terms of the degree of hot spots, the traffic load distribution, and throughput.","PeriodicalId":106240,"journal":{"name":"International Conference on Parallel Processing, 2004. ICPP 2004.","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123690433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 27

A novel FDTD application featuring OpenMP-MPI hybrid parallelization 一种具有OpenMP-MPI混合并行的新型FDTD应用

International Conference on Parallel Processing, 2004. ICPP 2004. Pub Date : 2004-08-15 DOI: 10.1109/ICPP.2004.1327945

M. Su, I. El-Kady, David A. Bader, Shawn-Yu Lin

引用次数: 54

The k-valent graph: a new family of Cayley graphs for interconnection networks k价图:互连网络的一类新的Cayley图

International Conference on Parallel Processing, 2004. ICPP 2004. Pub Date : 2004-08-15 DOI: 10.1109/ICPP.2004.1327923

S. Hsieh, T. Hsiao

引用次数: 2

RMAC: a reliable multicast MAC protocol for wireless ad hoc networks 一种可靠的多播MAC协议，用于无线自组织网络

International Conference on Parallel Processing, 2004. ICPP 2004. Pub Date : 2004-08-15 DOI: 10.1109/ICPP.2004.1327959

Weisheng Si, Chengzhi Li

引用次数: 66

Applying array contraction to a sequence of DOALL loops 对DOALL循环序列应用数组收缩

International Conference on Parallel Processing, 2004. ICPP 2004. Pub Date : 2004-08-15 DOI: 10.1109/ICPP.2004.1327903

Yonghong Song, Zhiyuan Li

引用次数: 4

Improving load/store queues usage in scientific computing 提高科学计算中的负载/存储队列使用率

International Conference on Parallel Processing, 2004. ICPP 2004. Pub Date : 2004-08-15 DOI: 10.1109/ICPP.2004.1327902

C. Lemuet, W. Jalby, S. Touati

{"title":"Improving load/store queues usage in scientific computing","authors":"C. Lemuet, W. Jalby, S. Touati","doi":"10.1109/ICPP.2004.1327902","DOIUrl":"https://doi.org/10.1109/ICPP.2004.1327902","url":null,"abstract":"Memory disambiguation mechanisms, coupled with load/store queues in out-of-order processors, are crucial to increase instruction level parallelism (ILP), especially for memory-bound scientific codes. Designing ideal memory disambiguation mechanisms is too complex because it would require precise address bits comparators; thus, modern microprocessors implement simplified and imprecise ones that perform only partial address comparisons. In this paper, we study the impact of such simplifications on the sustained performance of some real processors such that Alpha 21264, Power 4 and Itanium 2. Despite all the advanced features of these processors, we demonstrate in this article that memory address disambiguation mechanisms can cause significant performance loss. We demonstrate that, even if data are located in low cache levels and enough ILP exist, the performance degradation can be up to 21 times slower if no care is taken on the order of accessing independent memory addresses. Instead of proposing a hardware solution to improve load/store queues, as done in [G. Chrysos et al., (1998), S. Sethumadhavan et al., (2003), I. Park et al., (2003), A. Yoaz et al., (1999), S. Onder (2002)], we show that a software (compilation) technique is possible. Such solution is based on the classical (and robust) Id/st vectorization. Our experiments highlight the effectiveness of such method on BLAS 1 codes that are representative of vector scientific loops.","PeriodicalId":106240,"journal":{"name":"International Conference on Parallel Processing, 2004. ICPP 2004.","volume":"109 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124068269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Using tiling to scale parallel data cube construction 使用平铺缩放并行数据立方体结构

International Conference on Parallel Processing, 2004. ICPP 2004. Pub Date : 2004-08-15 DOI: 10.1109/ICPP.2004.1327944

R. Jin, K. Vaidyanathan, Ge Yang, G. Agrawal

{"title":"Using tiling to scale parallel data cube construction","authors":"R. Jin, K. Vaidyanathan, Ge Yang, G. Agrawal","doi":"10.1109/ICPP.2004.1327944","DOIUrl":"https://doi.org/10.1109/ICPP.2004.1327944","url":null,"abstract":"Data cube construction is a commonly used operation in data warehouses. Because of the volume of data that is stored and analyzed in a data warehouse and the amount of computation involved in data cube construction, it is natural to consider parallel machines for this operation. Also, for both sequential and parallel data cube construction, effectively using the main memory is an important challenge. In our prior work, we have developed parallel algorithms for this problem. We show how sequential and parallel data cube construction algorithms can be further scaled to handle larger problems, when the memory requirements could be a constraint. This is done by tiling the input and output arrays on each node. We address the challenges in using tiling while still maintaining the other desired properties of a data cube construction algorithm, which are, using minimal parents, and achieving maximal cache and memory reuse. We present a parallel algorithm that combines tiling with interprocessor communication. Our experimental results show the following. First, tiling helps in scaling data cube construction in both sequential and parallel environments. Second, choosing tiling parameters as per our theoretical results does result in better performance.","PeriodicalId":106240,"journal":{"name":"International Conference on Parallel Processing, 2004. ICPP 2004.","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134282174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Global partial replicate computation partitioning 全局部分复制计算分区

International Conference on Parallel Processing, 2004. ICPP 2004. Pub Date : 2004-08-15 DOI: 10.1109/ICPP.2004.1327910

Yiran Wang, Li Chen, Xiaobing Feng, Zhaoqing Zhang

引用次数: 1

Architecture and implementation of chip multiprocessors: custom logic components and software for rapid prototyping 芯片多处理器的架构和实现:用于快速原型的自定义逻辑组件和软件

International Conference on Parallel Processing, 2004. ICPP 2004. Pub Date : 2004-08-15 DOI: 10.1109/ICPP.2004.1327958

N. Manjikian, Huang Jin, J. Reed, N. Cordeiro

{"title":"Architecture and implementation of chip multiprocessors: custom logic components and software for rapid prototyping","authors":"N. Manjikian, Huang Jin, J. Reed, N. Cordeiro","doi":"10.1109/ICPP.2004.1327958","DOIUrl":"https://doi.org/10.1109/ICPP.2004.1327958","url":null,"abstract":"This work describes components and software tools in support of rapid prototyping in programmable logic for research on chip multiprocessors. Contemporary programmable logic chips offer considerable on-chip logic and memory resources. Prototyping of systems in programmable logic chips is faster and less costly than full-custom chip design. The first contribution that is described in this paper is a collection of original research-oriented logic components that provides processor, memory, and interconnect functionality for rapid prototyping. Because these are original components, and not proprietary vendor-supplied components, they may be arbitrarily extended and modified to suit research needs. The second contribution is a set of enhanced software tools for generating executable code. The third contribution is user-configurable software for testing and evaluating prototype chip multiprocessor implementations in hardware. In addition to describing these contributions, this paper provides results from implementing and testing prototype components and complete chip multiprocessors, including simulation waveforms, logic chip resource utilization, and observations of hardware operation.","PeriodicalId":106240,"journal":{"name":"International Conference on Parallel Processing, 2004. ICPP 2004.","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134440601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

OSCAR - an opportunistic call admission protocol for LEO satellite networks LEO卫星网络的机会呼叫接纳协议

International Conference on Parallel Processing, 2004. ICPP 2004. Pub Date : 2004-08-15 DOI: 10.1109/ICPP.2004.1327965

S. Olariu, Rajendra Shirhatti, Albert Y. Zomaya

引用次数: 9