2008 37th International Conference on Parallel Processing最新文献

筛选
英文 中文
On the Potentials of Segment-Based Routing for NoCs noc基于段路由的潜力研究
2008 37th International Conference on Parallel Processing Pub Date : 2008-09-09 DOI: 10.1109/ICPP.2008.56
A. Mejia, J. Flich, J. Duato
{"title":"On the Potentials of Segment-Based Routing for NoCs","authors":"A. Mejia, J. Flich, J. Duato","doi":"10.1109/ICPP.2008.56","DOIUrl":"https://doi.org/10.1109/ICPP.2008.56","url":null,"abstract":"The topology, the routing algorithm and the way the traffic pattern is distributed over the network influence the ultimate performance of the interconnection network. Off-chip high-performance interconnects provide mechanisms to support irregular topologies, whereas in on-chip networks the topology is fixed at design time. Continuous trend on device miniaturization and high volume manufacturing increase the probability of faults in embedded systems, leading to irregular topologies. Also, partitionability and virtualization of the entire on-chip network is envisioned for future systems. These trends lead to the need of routing algorithms that adapt to the static or dynamic changes in irregular topologies.In this paper we analyze the benefits of the reconfiguration at the routing algorithm level in order to allow topology changes. That is, support topology changes that appear on the network due to different reasons including switch or link failures, energy reduction decisions or design and manufacturing issues. We perform an exhaustive analysis on the performance impact of the routing algorithm in a NoC system. Our aim is to enable the possibility of reconfiguration of the routing algorithm. We take advantage on the flexibility offered by the segment-based routing methodology that allows a fast computation of many deadlock-free routing algorithms by obtaining different segmentation processes and routing restriction policies. This study analyzes the potentials offered by SR. Results show that the election of the routing algorithm may greatly affect the final performance of the network. Additionally, we propose an organized segmentation process that achieves reliable performance with low variability for all topologies studied under uniform traffic conditions. These results encourages us to the search of a dynamic mechanism that adapts the routing algorithm to the traffic.","PeriodicalId":388408,"journal":{"name":"2008 37th International Conference on Parallel Processing","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117193784","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 35
Two-Level Reorder Buffers: Accelerating Memory-Bound Applications on SMT Architectures 两级重排序缓冲器:加速SMT架构上的内存绑定应用
2008 37th International Conference on Parallel Processing Pub Date : 2008-09-09 DOI: 10.1109/ICPP.2008.24
Jason Loew, D. Ponomarev
{"title":"Two-Level Reorder Buffers: Accelerating Memory-Bound Applications on SMT Architectures","authors":"Jason Loew, D. Ponomarev","doi":"10.1109/ICPP.2008.24","DOIUrl":"https://doi.org/10.1109/ICPP.2008.24","url":null,"abstract":"We propose a low complexity mechanism for accelerating memory-bound threads on SMT processors without adversely impacting the performance of other concurrently running applications. The main idea is to provide a two-level organization of the Reorder Buffer (ROB), where the first level is comprised of small private per-thread ROBs which are used in the normal course of execution in the absence of last level cache misses. The second ROB level is a much larger storage that can be used on demand by threads experiencing last level cache misses. The key feature of our scheme is that the allocation of the second-level ROB partition occurs to a thread experiencing a miss into the last level cache only if the number of instructions dependent on the missing load is below a predetermined threshold. We introduce a novel low-complexity mechanism to count the number of load-dependent instructions and propose two schemes for allocating second level ROB: predictive and reactive. Our results demonstrate about 30% improvement over DCRA resource distribution mechanism in terms of \"harmonic mean of weighted IPCs\" metric.","PeriodicalId":388408,"journal":{"name":"2008 37th International Conference on Parallel Processing","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116833638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
The MAP3S Static-and-Regular Mesh Simulation and Wavefront Parallel-Programming Patterns MAP3S静态和规则网格仿真及波前并行编程模式
2008 37th International Conference on Parallel Processing Pub Date : 2008-09-09 DOI: 10.1109/ICPP.2008.34
R. Niewiadomski, J. N. Amaral, D. Szafron
{"title":"The MAP3S Static-and-Regular Mesh Simulation and Wavefront Parallel-Programming Patterns","authors":"R. Niewiadomski, J. N. Amaral, D. Szafron","doi":"10.1109/ICPP.2008.34","DOIUrl":"https://doi.org/10.1109/ICPP.2008.34","url":null,"abstract":"This paper presents the simulation and wavefront parallel-programming patterns of the MAP3S pattern-based parallel programming system for distributed-memory environments. Both patterns target iterative computations on static-and-regular meshes. In addition to providing performance-oriented features, such as asynchronous communication and distribution of the computational workload that is tailored to fit the computation, the patterns also provide usability-oriented features, such as direct mesh-access, mesh memory-footprint distribution, and a versatile data-dependency specification scripting-language. Parallel programs developed using MAP3S achieve significant performance gains and capability enhancements on both low-end and high-end interconnect-equipped distributed-memory systems.","PeriodicalId":388408,"journal":{"name":"2008 37th International Conference on Parallel Processing","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117071099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Prefix Computation and Sorting in Dual-Cube 双立方体中前缀的计算与排序
2008 37th International Conference on Parallel Processing Pub Date : 2008-09-09 DOI: 10.1109/ICPP.2008.18
Yamin Li, S. Peng, Wanming Chu
{"title":"Prefix Computation and Sorting in Dual-Cube","authors":"Yamin Li, S. Peng, Wanming Chu","doi":"10.1109/ICPP.2008.18","DOIUrl":"https://doi.org/10.1109/ICPP.2008.18","url":null,"abstract":"In this paper, we describe two algorithmic techniques for the design of efficient algorithms in dual-cube. The first uses cluster structure of dual-cube, and the second uses recursive structure of the dual-cube. We propose efficient algorithms for parallel prefix computation and sorting in dual-cube based on the two techniques, respectively. For a dual-cube Dn with 22n-1 nodes and n links per node, the communication and computation times of the algorithm for parallel prefix computation are at most 2n+1 and 4n-2, respectively; and those of the algorithm for sorting are at most 6n2 and 2n2, respectively.","PeriodicalId":388408,"journal":{"name":"2008 37th International Conference on Parallel Processing","volume":"93 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127602917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Location Dependent Cooperative Caching in MANET MANET中位置依赖的协同缓存
2008 37th International Conference on Parallel Processing Pub Date : 2008-09-09 DOI: 10.1109/ICPP.2008.26
Yilin Wang, E. Chan, Wenzhong Li, Sanglu Lu
{"title":"Location Dependent Cooperative Caching in MANET","authors":"Yilin Wang, E. Chan, Wenzhong Li, Sanglu Lu","doi":"10.1109/ICPP.2008.26","DOIUrl":"https://doi.org/10.1109/ICPP.2008.26","url":null,"abstract":"Location dependent information services (LDISs) are gaining increasing popularity in recent years. Due to limited client power and intermittent connectivity, caching is an important approach to improve the performance of LDISs. In this paper, we propose a new replacement policy called location dependent cooperative caching (LDCC). Unlike existing location dependent cache replacement policies, the LDCC strategy applies a prediction model to approximate client movement behaviors and a probabilistic transition model to analyze the communication cost, resulting in a fully improvement of overall performance. Simulation results demonstrate that the proposed strategy significantly outperforms existing policies.","PeriodicalId":388408,"journal":{"name":"2008 37th International Conference on Parallel Processing","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130428059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
On the Design of Fast Pseudo-Random Number Generators for the Cell Broadband Engine and an Application to Risk Analysis 小区宽带引擎快速伪随机数发生器的设计及其在风险分析中的应用
2008 37th International Conference on Parallel Processing Pub Date : 2008-09-09 DOI: 10.1109/ICPP.2008.41
David A. Bader, Aparna Chandramowlishwaran, Virat Agarwal
{"title":"On the Design of Fast Pseudo-Random Number Generators for the Cell Broadband Engine and an Application to Risk Analysis","authors":"David A. Bader, Aparna Chandramowlishwaran, Virat Agarwal","doi":"10.1109/ICPP.2008.41","DOIUrl":"https://doi.org/10.1109/ICPP.2008.41","url":null,"abstract":"Numerical simulations in computational physics, biology, and finance, often require the use of high quality and efficient parallel random number generators. We design and optimize several parallel pseudo random number generators on the cell broadband engine, with minimal correlation between the parallel streams: the linear congruential generator (LCG) with 64-bit prime addend and the Mersenne Twister (MT) algorithm. As compared with current Intel and AMD microprocessors, our Cell/B.E. LCG and MT implementations achieve a speed up of 33 and 29, respectively. We also explore two normalization techniques, Gaussian averaging method and box Mueller polar/cartesian, that transform uniform random numbers to a Gaussian distribution. Using these fast generators we develop a parallel implementation of value at risk, a commonly used model for risk assessment in financial markets. To our knowledge we have designed and implemented the fastest parallel pseudo random number generators on the Cell/B.E.","PeriodicalId":388408,"journal":{"name":"2008 37th International Conference on Parallel Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130948280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
On Clustering Tasks in IC-Optimal Dags ic最优分组的聚类任务
2008 37th International Conference on Parallel Processing Pub Date : 2008-09-09 DOI: 10.1109/ICPP.2008.14
M. Sims, G. Cordasco, A. Rosenberg
{"title":"On Clustering Tasks in IC-Optimal Dags","authors":"M. Sims, G. Cordasco, A. Rosenberg","doi":"10.1109/ICPP.2008.14","DOIUrl":"https://doi.org/10.1109/ICPP.2008.14","url":null,"abstract":"Strategies are developed for \"fattening\" the tasks of computation-dags so as to accommodate the heterogeneity of remote clients in Internet-based computing (IC). Earlier work has developed the underpinnings of IC-scheduling theory, an algorithmic framework for scheduling computations having intertask dependencies for IC. The theory's schedules strive to render tasks eligible for execution at the maximum possible rate, so as to: (a) utilize remoteclients' computational resources well, by enhancing the likelihood of having work to allocate to an available client; (b) lessen the likelihood of a computation's stalling for lack of tasks that are eligible for allocation. The current study begins to enhance IC-scheduling theory so that it can accommodate the varying computational resources of remote clients. The techniques developed here render a dag multi-granular by clustering its tasks. Several clustering strategies are developed: one works for any dag but produces only a limited variety of \"fattened\" tasks; others exploit the detailed structure of the dag being scheduled but allow a broad range of \"fattened\" tasks.","PeriodicalId":388408,"journal":{"name":"2008 37th International Conference on Parallel Processing","volume":"184 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131886040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
TFlux: A Portable Platform for Data-Driven Multithreading on Commodity Multicore Systems TFlux:商用多核系统上数据驱动多线程的便携平台
2008 37th International Conference on Parallel Processing Pub Date : 2008-09-09 DOI: 10.1109/ICPP.2008.74
Kyriakos Stavrou, Marios Nikolaides, Demos Pavlou, Samer Arandi, P. Evripidou, P. Trancoso
{"title":"TFlux: A Portable Platform for Data-Driven Multithreading on Commodity Multicore Systems","authors":"Kyriakos Stavrou, Marios Nikolaides, Demos Pavlou, Samer Arandi, P. Evripidou, P. Trancoso","doi":"10.1109/ICPP.2008.74","DOIUrl":"https://doi.org/10.1109/ICPP.2008.74","url":null,"abstract":"In this paper we present thread flux (TFlux), a complete system that supports the data-driven multithreading (DDM) model of execution. TFlux virtualizes any details of the underlying system therefore offering the same programming model independently of the architecture. To achieve this goal, TFlux has a runtime support that is built on top of a commodity operating system. Scheduling of threads is performed by the thread synchronization unit (TSU), which can be implemented either as a hardware or a software module. In addition, TFlux includes a preprocessor that, along with a set of simple compiler directives, allows the user to easily develop DDM programs. The preprocessor then automatically produces the TFlux code, which can be compiled using any commodity C compiler, therefore automatically producing code to any ISA. TFlux has been validated on three platforms. A Simics-based multicore system with a TSU hardware module (TFluxHard), a commodity 8-core Intel Core2 QuadCore-based system with a software TSU module (TFluxSoft), and a Cell/BE system with a software TSU module (TFluxCell). The experimental results show that the performance achieved is close to linear speedup, on average 21x for the 27 nodes TFluxHard, and 4.4x on a 6 nodes TFluxSoft and TFluxCell. Most importantly, the observed speedup is stable across the different platforms thus allowing the benefits of DDM to be exploited on different commodity systems.","PeriodicalId":388408,"journal":{"name":"2008 37th International Conference on Parallel Processing","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133463353","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 45
IMCa: A High Performance Caching Front-End for GlusterFS on InfiniBand IMCa:基于InfiniBand的GlusterFS高性能缓存前端
2008 37th International Conference on Parallel Processing Pub Date : 2008-09-09 DOI: 10.1109/ICPP.2008.84
R. Noronha, D. Panda
{"title":"IMCa: A High Performance Caching Front-End for GlusterFS on InfiniBand","authors":"R. Noronha, D. Panda","doi":"10.1109/ICPP.2008.84","DOIUrl":"https://doi.org/10.1109/ICPP.2008.84","url":null,"abstract":"With the rapid advances in computing technology, there is an explosion in media that needs to collected, cataloged, stored and accessed. With the speed of disks not keeping pace with the improvements in processor and network speed, the ability of network file systems to provide data to demanding applications at an appropriate rate is diminishing. In this paper, we propose to enhance the performance of network file systems by providing an Intermediate bank of cache servers between the client and server called (IMCa). Whenever possible, file system operations from the client are serviced from the cache bank. We evaluate IMCa with a number of different benchmarks. The results of these experiments demonstrate that the intermediate cache architecture can reduce the latency of certain operations by upto 82% over the native implementation and upto 86% compared with the Lustre file system. In addition, we also see an improvement in the performance of data transfer operations in most cases and for most scenarios. Finally the caching hierarchy helps us to achieve better scalability of file system operations.","PeriodicalId":388408,"journal":{"name":"2008 37th International Conference on Parallel Processing","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115484425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Machine Learning Models to Predict Performance of Computer System Design Alternatives 预测计算机系统设计方案性能的机器学习模型
2008 37th International Conference on Parallel Processing Pub Date : 2008-09-09 DOI: 10.1109/ICPP.2008.36
Berkin Özisikyilmaz, G. Memik, A. Choudhary
{"title":"Machine Learning Models to Predict Performance of Computer System Design Alternatives","authors":"Berkin Özisikyilmaz, G. Memik, A. Choudhary","doi":"10.1109/ICPP.2008.36","DOIUrl":"https://doi.org/10.1109/ICPP.2008.36","url":null,"abstract":"Computer manufacturers spend a huge amount of time, resources, and money in designing new systems and newer configurations, and their ability to reduce costs, charge competitive prices, and gain market share depends on how good these systems perform. In this work, we concentrate on both the system design and the architectural design processes for parallel computers and develop methods to expedite them. Our methodology relies on extracting the performance levels of a small fraction of the machines in the design space and using this information to develop linear regression and neural network models to predict the performance of any machine in the whole design space. In terms of architectural design, we show that by using only 1% of the design space (i.e., cycle-accurate simulations), we can predict the performance of the whole design space within 3.4% error rate. In the system design area, we utilize the previously published Standard Performance Evaluation Corporation (SPEC) benchmark numbers to predict the performance of future systems. We concentrate on multiprocessor systems and show that our models can predict the performance of future systems within 2.2% error rate on average. We believe that these tools can accelerate the design space exploration significantly and aid in reducing the corresponding research/development cost and time-to-market.","PeriodicalId":388408,"journal":{"name":"2008 37th International Conference on Parallel Processing","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114408474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信