2008 37th International Conference on Parallel Processing最新文献_第7页

On the Potentials of Segment-Based Routing for NoCs noc基于段路由的潜力研究

2008 37th International Conference on Parallel Processing Pub Date : 2008-09-09 DOI: 10.1109/ICPP.2008.56

A. Mejia, J. Flich, J. Duato

{"title":"On the Potentials of Segment-Based Routing for NoCs","authors":"A. Mejia, J. Flich, J. Duato","doi":"10.1109/ICPP.2008.56","DOIUrl":"https://doi.org/10.1109/ICPP.2008.56","url":null,"abstract":"The topology, the routing algorithm and the way the traffic pattern is distributed over the network influence the ultimate performance of the interconnection network. Off-chip high-performance interconnects provide mechanisms to support irregular topologies, whereas in on-chip networks the topology is fixed at design time. Continuous trend on device miniaturization and high volume manufacturing increase the probability of faults in embedded systems, leading to irregular topologies. Also, partitionability and virtualization of the entire on-chip network is envisioned for future systems. These trends lead to the need of routing algorithms that adapt to the static or dynamic changes in irregular topologies.In this paper we analyze the benefits of the reconfiguration at the routing algorithm level in order to allow topology changes. That is, support topology changes that appear on the network due to different reasons including switch or link failures, energy reduction decisions or design and manufacturing issues. We perform an exhaustive analysis on the performance impact of the routing algorithm in a NoC system. Our aim is to enable the possibility of reconfiguration of the routing algorithm. We take advantage on the flexibility offered by the segment-based routing methodology that allows a fast computation of many deadlock-free routing algorithms by obtaining different segmentation processes and routing restriction policies. This study analyzes the potentials offered by SR. Results show that the election of the routing algorithm may greatly affect the final performance of the network. Additionally, we propose an organized segmentation process that achieves reliable performance with low variability for all topologies studied under uniform traffic conditions. These results encourages us to the search of a dynamic mechanism that adapts the routing algorithm to the traffic.","PeriodicalId":388408,"journal":{"name":"2008 37th International Conference on Parallel Processing","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117193784","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 35

Two-Level Reorder Buffers: Accelerating Memory-Bound Applications on SMT Architectures 两级重排序缓冲器:加速SMT架构上的内存绑定应用

2008 37th International Conference on Parallel Processing Pub Date : 2008-09-09 DOI: 10.1109/ICPP.2008.24

Jason Loew, D. Ponomarev

引用次数: 3

The MAP3S Static-and-Regular Mesh Simulation and Wavefront Parallel-Programming Patterns MAP3S静态和规则网格仿真及波前并行编程模式

2008 37th International Conference on Parallel Processing Pub Date : 2008-09-09 DOI: 10.1109/ICPP.2008.34

R. Niewiadomski, J. N. Amaral, D. Szafron

引用次数: 1

Prefix Computation and Sorting in Dual-Cube 双立方体中前缀的计算与排序

2008 37th International Conference on Parallel Processing Pub Date : 2008-09-09 DOI: 10.1109/ICPP.2008.18

Yamin Li, S. Peng, Wanming Chu

引用次数: 3

Location Dependent Cooperative Caching in MANET MANET中位置依赖的协同缓存

2008 37th International Conference on Parallel Processing Pub Date : 2008-09-09 DOI: 10.1109/ICPP.2008.26

Yilin Wang, E. Chan, Wenzhong Li, Sanglu Lu

引用次数: 8

On the Design of Fast Pseudo-Random Number Generators for the Cell Broadband Engine and an Application to Risk Analysis 小区宽带引擎快速伪随机数发生器的设计及其在风险分析中的应用

2008 37th International Conference on Parallel Processing Pub Date : 2008-09-09 DOI: 10.1109/ICPP.2008.41

David A. Bader, Aparna Chandramowlishwaran, Virat Agarwal

引用次数: 11

On Clustering Tasks in IC-Optimal Dags ic最优分组的聚类任务

2008 37th International Conference on Parallel Processing Pub Date : 2008-09-09 DOI: 10.1109/ICPP.2008.14

M. Sims, G. Cordasco, A. Rosenberg

引用次数: 6

TFlux: A Portable Platform for Data-Driven Multithreading on Commodity Multicore Systems TFlux:商用多核系统上数据驱动多线程的便携平台

2008 37th International Conference on Parallel Processing Pub Date : 2008-09-09 DOI: 10.1109/ICPP.2008.74

Kyriakos Stavrou, Marios Nikolaides, Demos Pavlou, Samer Arandi, P. Evripidou, P. Trancoso

{"title":"TFlux: A Portable Platform for Data-Driven Multithreading on Commodity Multicore Systems","authors":"Kyriakos Stavrou, Marios Nikolaides, Demos Pavlou, Samer Arandi, P. Evripidou, P. Trancoso","doi":"10.1109/ICPP.2008.74","DOIUrl":"https://doi.org/10.1109/ICPP.2008.74","url":null,"abstract":"In this paper we present thread flux (TFlux), a complete system that supports the data-driven multithreading (DDM) model of execution. TFlux virtualizes any details of the underlying system therefore offering the same programming model independently of the architecture. To achieve this goal, TFlux has a runtime support that is built on top of a commodity operating system. Scheduling of threads is performed by the thread synchronization unit (TSU), which can be implemented either as a hardware or a software module. In addition, TFlux includes a preprocessor that, along with a set of simple compiler directives, allows the user to easily develop DDM programs. The preprocessor then automatically produces the TFlux code, which can be compiled using any commodity C compiler, therefore automatically producing code to any ISA. TFlux has been validated on three platforms. A Simics-based multicore system with a TSU hardware module (TFluxHard), a commodity 8-core Intel Core2 QuadCore-based system with a software TSU module (TFluxSoft), and a Cell/BE system with a software TSU module (TFluxCell). The experimental results show that the performance achieved is close to linear speedup, on average 21x for the 27 nodes TFluxHard, and 4.4x on a 6 nodes TFluxSoft and TFluxCell. Most importantly, the observed speedup is stable across the different platforms thus allowing the benefits of DDM to be exploited on different commodity systems.","PeriodicalId":388408,"journal":{"name":"2008 37th International Conference on Parallel Processing","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133463353","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 45

IMCa: A High Performance Caching Front-End for GlusterFS on InfiniBand IMCa:基于InfiniBand的GlusterFS高性能缓存前端

2008 37th International Conference on Parallel Processing Pub Date : 2008-09-09 DOI: 10.1109/ICPP.2008.84

R. Noronha, D. Panda

引用次数: 19

Machine Learning Models to Predict Performance of Computer System Design Alternatives 预测计算机系统设计方案性能的机器学习模型

2008 37th International Conference on Parallel Processing Pub Date : 2008-09-09 DOI: 10.1109/ICPP.2008.36

Berkin Özisikyilmaz, G. Memik, A. Choudhary

{"title":"Machine Learning Models to Predict Performance of Computer System Design Alternatives","authors":"Berkin Özisikyilmaz, G. Memik, A. Choudhary","doi":"10.1109/ICPP.2008.36","DOIUrl":"https://doi.org/10.1109/ICPP.2008.36","url":null,"abstract":"Computer manufacturers spend a huge amount of time, resources, and money in designing new systems and newer configurations, and their ability to reduce costs, charge competitive prices, and gain market share depends on how good these systems perform. In this work, we concentrate on both the system design and the architectural design processes for parallel computers and develop methods to expedite them. Our methodology relies on extracting the performance levels of a small fraction of the machines in the design space and using this information to develop linear regression and neural network models to predict the performance of any machine in the whole design space. In terms of architectural design, we show that by using only 1% of the design space (i.e., cycle-accurate simulations), we can predict the performance of the whole design space within 3.4% error rate. In the system design area, we utilize the previously published Standard Performance Evaluation Corporation (SPEC) benchmark numbers to predict the performance of future systems. We concentrate on multiprocessor systems and show that our models can predict the performance of future systems within 2.2% error rate on average. We believe that these tools can accelerate the design space exploration significantly and aid in reducing the corresponding research/development cost and time-to-market.","PeriodicalId":388408,"journal":{"name":"2008 37th International Conference on Parallel Processing","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114408474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 21