International Conference on Parallel Processing, 2004. ICPP 2004.最新文献

筛选
英文 中文
An efficient deadlock-free tree-based routing algorithm for irregular wormhole-routed networks based on the turn model 基于回合模型的不规则虫洞网络无死锁树路由算法
International Conference on Parallel Processing, 2004. ICPP 2004. Pub Date : 2004-08-15 DOI: 10.1109/ICPP.2004.1327941
Yau-Ming Sun, Chih-Hsueh Yang, Yeh-Ching Chung, Tai-Yi Huang
{"title":"An efficient deadlock-free tree-based routing algorithm for irregular wormhole-routed networks based on the turn model","authors":"Yau-Ming Sun, Chih-Hsueh Yang, Yeh-Ching Chung, Tai-Yi Huang","doi":"10.1109/ICPP.2004.1327941","DOIUrl":"https://doi.org/10.1109/ICPP.2004.1327941","url":null,"abstract":"We proposed an efficient deadlock-free tree-based routing algorithm, the DOWN/UP routing, for irregular wormhole-routed networks based on the turn model. In a tree-based routing algorithm, hot spots around the root of a spanning tree and the uneven traffic distribution are the two main facts degrade the performance of the routing algorithm. To solve the hot spot and the uneven traffic distribution problems, in the DOWN/UP routing, it tries to push the traffic downward to the leaves of a spanning tree as much as possible and remove prohibited turn pairs with opposite directions in each node, respectively. To evaluate the performance of DOWN/UP routing, the simulation is conducted. We have implemented the DOWN/UP routing along with the L-turn routing on the IRFlexSim0.5 simulator. Irregular networks that contain 128 switches with 4-port and 8-port configurations are simulated. The simulation results show that the proposed routing algorithm outperforms the L-turn routing for all test samples in terms of the degree of hot spots, the traffic load distribution, and throughput.","PeriodicalId":106240,"journal":{"name":"International Conference on Parallel Processing, 2004. ICPP 2004.","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123690433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
A novel FDTD application featuring OpenMP-MPI hybrid parallelization 一种具有OpenMP-MPI混合并行的新型FDTD应用
International Conference on Parallel Processing, 2004. ICPP 2004. Pub Date : 2004-08-15 DOI: 10.1109/ICPP.2004.1327945
M. Su, I. El-Kady, David A. Bader, Shawn-Yu Lin
{"title":"A novel FDTD application featuring OpenMP-MPI hybrid parallelization","authors":"M. Su, I. El-Kady, David A. Bader, Shawn-Yu Lin","doi":"10.1109/ICPP.2004.1327945","DOIUrl":"https://doi.org/10.1109/ICPP.2004.1327945","url":null,"abstract":"We have developed a high performance hybridized parallel finite difference time domain (FDTD) algorithm featuring both OpenMP shared memory programming and MPl message passing. Our goal is to effectively model the optical characteristics of a novel light source created by utilizing a new class of materials known as photonic band-gap crystals. Our method is based on the solution of the second order discretized Maxwell's equations in space and time. This novel hybrid parallelization scheme allows us to take advantage of the new generation parallel machines possessing connected SMP nodes. By using parallel computations, we are able to complete a calculation on 24 processors in less than a day, where a serial version would have taken over three weeks. We present a detailed study of this hybrid scheme on an SGI origin 2000 distributed shared memory ccNUMA system along with a complete investigation of the advantages versus drawbacks of this method.","PeriodicalId":106240,"journal":{"name":"International Conference on Parallel Processing, 2004. ICPP 2004.","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117096723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 54
The k-valent graph: a new family of Cayley graphs for interconnection networks k价图:互连网络的一类新的Cayley图
International Conference on Parallel Processing, 2004. ICPP 2004. Pub Date : 2004-08-15 DOI: 10.1109/ICPP.2004.1327923
S. Hsieh, T. Hsiao
{"title":"The k-valent graph: a new family of Cayley graphs for interconnection networks","authors":"S. Hsieh, T. Hsiao","doi":"10.1109/ICPP.2004.1327923","DOIUrl":"https://doi.org/10.1109/ICPP.2004.1327923","url":null,"abstract":"This work introduces a new family of Cayley graphs, named the k-valent graphs, for building interconnection networks. It includes the trivalent Cayley graphs (Vadapalli and Srimani, 1995) as a subclass. These new graphs are shown to be regular with the node-degree k, to have logarithmic diameter subject to the number of nodes, and to be k-connected as well as maximally fault tolerant. We also propose a shortest path routing algorithm and investigate some algebraic properties like cycles or cliques embedding.","PeriodicalId":106240,"journal":{"name":"International Conference on Parallel Processing, 2004. ICPP 2004.","volume":"96 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115985379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
RMAC: a reliable multicast MAC protocol for wireless ad hoc networks 一种可靠的多播MAC协议,用于无线自组织网络
International Conference on Parallel Processing, 2004. ICPP 2004. Pub Date : 2004-08-15 DOI: 10.1109/ICPP.2004.1327959
Weisheng Si, Chengzhi Li
{"title":"RMAC: a reliable multicast MAC protocol for wireless ad hoc networks","authors":"Weisheng Si, Chengzhi Li","doi":"10.1109/ICPP.2004.1327959","DOIUrl":"https://doi.org/10.1109/ICPP.2004.1327959","url":null,"abstract":"This work presents a new MAC protocol called RMAC that supports reliable multicast for wireless ad hoc networks. By utilizing the busy tone mechanism to realize multicast reliability, RMAC has the following three novelties: (1) it uses a variable-length control frame to stipulate an order for the receivers to respond, such that the problem of feedback collision is solved; (2) it extends the traditional usage of busy tone for preventing data frame collisions into the multicast scenario; and (3) it introduces a new usage of busy tone for acknowledging data frames. In addition, we also generalize RMAC into a comprehensive MAC protocol that provides both reliable and unreliable services for all the three modes of communications: unicast, multicast, and broadcast. Our evaluation shows that RMAC achieves high reliability with very limited overhead. We also compare RMAC with other reliable multicast MAC protocols, showing that RMAC not only provides higher reliability but also involves lower cost.","PeriodicalId":106240,"journal":{"name":"International Conference on Parallel Processing, 2004. ICPP 2004.","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115151849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 66
Applying array contraction to a sequence of DOALL loops 对DOALL循环序列应用数组收缩
International Conference on Parallel Processing, 2004. ICPP 2004. Pub Date : 2004-08-15 DOI: 10.1109/ICPP.2004.1327903
Yonghong Song, Zhiyuan Li
{"title":"Applying array contraction to a sequence of DOALL loops","authors":"Yonghong Song, Zhiyuan Li","doi":"10.1109/ICPP.2004.1327903","DOIUrl":"https://doi.org/10.1109/ICPP.2004.1327903","url":null,"abstract":"Efficient program execution on multiprocessor computers requires both sufficient parallelism and good data locality. Recent research found that, using a combination of loop shifting, loop fusion, and array contraction, one can reduce the memory required to execute a sequence of serial loops, thereby to improve the cache locality. This paper studies how to extend such a memory-reduction scheme to a sequence of DOALL loops, which are executed in parallel on multiprocessors. Two methods are proposed to overcome difficulties caused by loop-carried dependences. Data copy-in is performed to remove anti-dependences between different parallel threads, and computation duplication is performed to remove flow dependences. Experiments performed on a number of benchmark programs show that the proposed technique improves both cache locality and parallel execution speed for the DOALL loops. The scheme achieves an average speedup of 1.41 for 17 programs on a 4-processor SUN machine.","PeriodicalId":106240,"journal":{"name":"International Conference on Parallel Processing, 2004. ICPP 2004.","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126065477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Improving load/store queues usage in scientific computing 提高科学计算中的负载/存储队列使用率
International Conference on Parallel Processing, 2004. ICPP 2004. Pub Date : 2004-08-15 DOI: 10.1109/ICPP.2004.1327902
C. Lemuet, W. Jalby, S. Touati
{"title":"Improving load/store queues usage in scientific computing","authors":"C. Lemuet, W. Jalby, S. Touati","doi":"10.1109/ICPP.2004.1327902","DOIUrl":"https://doi.org/10.1109/ICPP.2004.1327902","url":null,"abstract":"Memory disambiguation mechanisms, coupled with load/store queues in out-of-order processors, are crucial to increase instruction level parallelism (ILP), especially for memory-bound scientific codes. Designing ideal memory disambiguation mechanisms is too complex because it would require precise address bits comparators; thus, modern microprocessors implement simplified and imprecise ones that perform only partial address comparisons. In this paper, we study the impact of such simplifications on the sustained performance of some real processors such that Alpha 21264, Power 4 and Itanium 2. Despite all the advanced features of these processors, we demonstrate in this article that memory address disambiguation mechanisms can cause significant performance loss. We demonstrate that, even if data are located in low cache levels and enough ILP exist, the performance degradation can be up to 21 times slower if no care is taken on the order of accessing independent memory addresses. Instead of proposing a hardware solution to improve load/store queues, as done in [G. Chrysos et al., (1998), S. Sethumadhavan et al., (2003), I. Park et al., (2003), A. Yoaz et al., (1999), S. Onder (2002)], we show that a software (compilation) technique is possible. Such solution is based on the classical (and robust) Id/st vectorization. Our experiments highlight the effectiveness of such method on BLAS 1 codes that are representative of vector scientific loops.","PeriodicalId":106240,"journal":{"name":"International Conference on Parallel Processing, 2004. ICPP 2004.","volume":"109 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124068269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Using tiling to scale parallel data cube construction 使用平铺缩放并行数据立方体结构
International Conference on Parallel Processing, 2004. ICPP 2004. Pub Date : 2004-08-15 DOI: 10.1109/ICPP.2004.1327944
R. Jin, K. Vaidyanathan, Ge Yang, G. Agrawal
{"title":"Using tiling to scale parallel data cube construction","authors":"R. Jin, K. Vaidyanathan, Ge Yang, G. Agrawal","doi":"10.1109/ICPP.2004.1327944","DOIUrl":"https://doi.org/10.1109/ICPP.2004.1327944","url":null,"abstract":"Data cube construction is a commonly used operation in data warehouses. Because of the volume of data that is stored and analyzed in a data warehouse and the amount of computation involved in data cube construction, it is natural to consider parallel machines for this operation. Also, for both sequential and parallel data cube construction, effectively using the main memory is an important challenge. In our prior work, we have developed parallel algorithms for this problem. We show how sequential and parallel data cube construction algorithms can be further scaled to handle larger problems, when the memory requirements could be a constraint. This is done by tiling the input and output arrays on each node. We address the challenges in using tiling while still maintaining the other desired properties of a data cube construction algorithm, which are, using minimal parents, and achieving maximal cache and memory reuse. We present a parallel algorithm that combines tiling with interprocessor communication. Our experimental results show the following. First, tiling helps in scaling data cube construction in both sequential and parallel environments. Second, choosing tiling parameters as per our theoretical results does result in better performance.","PeriodicalId":106240,"journal":{"name":"International Conference on Parallel Processing, 2004. ICPP 2004.","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134282174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Global partial replicate computation partitioning 全局部分复制计算分区
International Conference on Parallel Processing, 2004. ICPP 2004. Pub Date : 2004-08-15 DOI: 10.1109/ICPP.2004.1327910
Yiran Wang, Li Chen, Xiaobing Feng, Zhaoqing Zhang
{"title":"Global partial replicate computation partitioning","authors":"Yiran Wang, Li Chen, Xiaobing Feng, Zhaoqing Zhang","doi":"10.1109/ICPP.2004.1327910","DOIUrl":"https://doi.org/10.1109/ICPP.2004.1327910","url":null,"abstract":"Early parallelizing compilers use the owner-computes rule to partition computation. Partial replication is then introduced to eliminate near-neighbor communication at the cost of some replicated computation, hence improves the performance and scalability. Current exploration of partial replicate computation partitioning is limited within a single loop nest. We present a formal description of the global partial replicate computation partitioning problem, a simplified cost model and a heuristic solution. Experimental results show that the solution is superior to local approaches.","PeriodicalId":106240,"journal":{"name":"International Conference on Parallel Processing, 2004. ICPP 2004.","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125133220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Architecture and implementation of chip multiprocessors: custom logic components and software for rapid prototyping 芯片多处理器的架构和实现:用于快速原型的自定义逻辑组件和软件
International Conference on Parallel Processing, 2004. ICPP 2004. Pub Date : 2004-08-15 DOI: 10.1109/ICPP.2004.1327958
N. Manjikian, Huang Jin, J. Reed, N. Cordeiro
{"title":"Architecture and implementation of chip multiprocessors: custom logic components and software for rapid prototyping","authors":"N. Manjikian, Huang Jin, J. Reed, N. Cordeiro","doi":"10.1109/ICPP.2004.1327958","DOIUrl":"https://doi.org/10.1109/ICPP.2004.1327958","url":null,"abstract":"This work describes components and software tools in support of rapid prototyping in programmable logic for research on chip multiprocessors. Contemporary programmable logic chips offer considerable on-chip logic and memory resources. Prototyping of systems in programmable logic chips is faster and less costly than full-custom chip design. The first contribution that is described in this paper is a collection of original research-oriented logic components that provides processor, memory, and interconnect functionality for rapid prototyping. Because these are original components, and not proprietary vendor-supplied components, they may be arbitrarily extended and modified to suit research needs. The second contribution is a set of enhanced software tools for generating executable code. The third contribution is user-configurable software for testing and evaluating prototype chip multiprocessor implementations in hardware. In addition to describing these contributions, this paper provides results from implementing and testing prototype components and complete chip multiprocessors, including simulation waveforms, logic chip resource utilization, and observations of hardware operation.","PeriodicalId":106240,"journal":{"name":"International Conference on Parallel Processing, 2004. ICPP 2004.","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134440601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
OSCAR - an opportunistic call admission protocol for LEO satellite networks LEO卫星网络的机会呼叫接纳协议
International Conference on Parallel Processing, 2004. ICPP 2004. Pub Date : 2004-08-15 DOI: 10.1109/ICPP.2004.1327965
S. Olariu, Rajendra Shirhatti, Albert Y. Zomaya
{"title":"OSCAR - an opportunistic call admission protocol for LEO satellite networks","authors":"S. Olariu, Rajendra Shirhatti, Albert Y. Zomaya","doi":"10.1109/ICPP.2004.1327965","DOIUrl":"https://doi.org/10.1109/ICPP.2004.1327965","url":null,"abstract":"The main contribution of this work is to propose OSCAR - an opportunistic call admission protocol that provides a simple and robust solution to call admission and handoff management in LEO satellite networks. One of the features that sets OSCAR apart from existing protocols is that it avoids the overhead of reserving resources for users in a series of spotbeams along predicted user trajectories. Instead, OSCAR relies on a novel opportunistic bandwidth allocation mechanism that is very simple and efficient and does not involve maintaining complicated data structures or making expensive reservations. Extensive simulation results have shown that OSCAR achieves results comparable to those of Q-Win: it features very low call dropping probability, thus providing for reliable handoff of on-going calls, low call blocking probability for new call requests, and high bandwidth utilization.","PeriodicalId":106240,"journal":{"name":"International Conference on Parallel Processing, 2004. ICPP 2004.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131292162","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信