2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum最新文献_第4页

Deriving a Methodology for Code Deployment on Multi-Core Platforms via Iterative Manual Optimizations 基于迭代人工优化的多核平台代码部署方法

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum Pub Date : 2012-05-21 DOI: 10.1109/IPDPSW.2012.178

Stuart McCool, P. Milligan, P. Sage

{"title":"Deriving a Methodology for Code Deployment on Multi-Core Platforms via Iterative Manual Optimizations","authors":"Stuart McCool, P. Milligan, P. Sage","doi":"10.1109/IPDPSW.2012.178","DOIUrl":"https://doi.org/10.1109/IPDPSW.2012.178","url":null,"abstract":"In recent years, there has been what can only be described as an explosion in the types of processing devices one can expect to find within a given computer system. These include the multi-core CPU, the General Purpose Graphics Processing Unit (GPGPU) and the Accelerated Processing Unit (APU), to name but a few. The widespread uptake of these systems presents would-be users with at least two problems. Firstly, each device exposes a complex underlying architecture which must be appreciated in order to attain optimal performance. This is coupled with the fact that a single system can support an arbitrary number of such devices. Consequently, fully leveraging the performance capabilities of such a system must come at a cost -- increasingly prolonged development times. Adhering to a methodology will have the significant industrial impact of reducing these development times. This paper describes the continued formulation of such a novel methodology. Two real world scientific programs are optimized for execution on the CUDA platform. Double precision accuracy and optimized speedups (which include PCI-E transfer times) of 15x and 17x are achieved.","PeriodicalId":378335,"journal":{"name":"2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133843649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

A Highly Efficient Consolidated Platform for Stream Computing and Hadoop 一个高效的流计算和Hadoop整合平台

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum Pub Date : 2012-05-21 DOI: 10.1109/IPDPSW.2012.252

H. Matsuura, Masaru Ganse, T. Suzumura

引用次数: 5

An Energy-Optimum and Communication-Time Efficient Protocol for Allocation, Scheduling and Routing in Wireless Networks 一种用于无线网络分配、调度和路由的能量优化和通信时间效率协议

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum Pub Date : 2012-05-21 DOI: 10.1109/IPDPSW.2012.104

Thiago F. Neves, Marcos F. Caetano, J. Bordim

引用次数: 4

BLOR: Bandwidth and Latency Sensitive Overlay Routing for Flash Data Dissemination BLOR:带宽和延迟敏感覆盖路由闪存数据传播

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum Pub Date : 2012-05-21 DOI: 10.1109/IPDPSW.2012.21

Xiaoyong Li, Yijie Wang, Yongquan Fu, Xiaoling Li, Weidong Sun

{"title":"BLOR: Bandwidth and Latency Sensitive Overlay Routing for Flash Data Dissemination","authors":"Xiaoyong Li, Yijie Wang, Yongquan Fu, Xiaoling Li, Weidong Sun","doi":"10.1109/IPDPSW.2012.21","DOIUrl":"https://doi.org/10.1109/IPDPSW.2012.21","url":null,"abstract":"Flash data dissemination transmits time-critical data to distributed receivers in a timely manner, which is widely used in many mission-critical applications. However, existing flash data dissemination approaches fail to guarantee the timely transmission due to the unpredictability of the dissemination process. Overlay routing has been widely used as an efficient routing primitive for providing better end-to-end routing quality, based on detouring inefficient routing paths in the P2P network. To improve the predictability of the flash data dissemination process, we propose a bandwidth and latency sensitive overlay routing scheme BLOR, by optimizing the overlay routing and avoiding poor performance of the data dissemination paths. BLOR tries to select optimal paths in terms of latency, bandwidth capacity and available bandwidth in nature, which has never been studied before. Additionally, a location-aware unstructured overlay topology construction scheme is proposed to improve the routing efficiency and data location, as well as an unbiased top-k dominating model is proposed to balance the multi-factor choosing for path selecting, in the optimization process of BLOR. Experimental results with real-world data sets confirm that BLOR significantly improves flash data dissemination.","PeriodicalId":378335,"journal":{"name":"2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114457281","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

A Heterogeneous Cache Distribution with Reconfigurable Interconnect 具有可重构互连的异构缓存分布

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum Pub Date : 2012-05-21 DOI: 10.1109/IPDPSW.2012.31

Aishwariya Pattabiraman, A. Avakian, R. Vemuri

{"title":"A Heterogeneous Cache Distribution with Reconfigurable Interconnect","authors":"Aishwariya Pattabiraman, A. Avakian, R. Vemuri","doi":"10.1109/IPDPSW.2012.31","DOIUrl":"https://doi.org/10.1109/IPDPSW.2012.31","url":null,"abstract":"Current trends in multicore research suggest that hundreds of cores will be integrated on a single chip in the near future for increased performance. This new trend presents a set of challenges, one of which is cache distribution among the cores. Network on chip with homogeneous cache distribution among the routers has become mainstream in literature. In this paper, we propose having a heterogeneous distribution of cache blocks to routers. The heterogeneity and the appropriate scheduling by the OS will help to reduce network hops by placing more cache blocks closer to the cores executing data intensive applications. We show that this distribution reduces cache access overhead by as much as 20% percent. Furthermore, we also propose reconfigurable heterogeneous cache architecture for multi-threaded workloads. In this scheme, cache blocks are reassigned to routers based on data needs. A constructive heuristic has been presented which gives the optimal cache configuration and page coloring for each workload. We show that this approach can effectively reduce cache access time by as much as 61% percent.","PeriodicalId":378335,"journal":{"name":"2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117282239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An On-Demand Fast Parallel Pseudo Random Number Generator with Applications 一个按需快速并行伪随机数生成器及其应用

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum Pub Date : 2012-05-21 DOI: 10.1109/IPDPSW.2012.212

D. Banerjee, A. Bahl, Kishore Kothapalli

{"title":"An On-Demand Fast Parallel Pseudo Random Number Generator with Applications","authors":"D. Banerjee, A. Bahl, Kishore Kothapalli","doi":"10.1109/IPDPSW.2012.212","DOIUrl":"https://doi.org/10.1109/IPDPSW.2012.212","url":null,"abstract":"The use of manycore architectures and accelerators, such as GPUs, with good programmability has allowed them to be deployed for vital computational work. The ability to use randomness in computation is known to help in several situations. For such computations to be made possible on a general purpose computer, a source of randomness, or in general a pseudo random generator (PRNG), is essential. However, most of the PRNGs currently available on GPUs suffer from some basic drawbacks that we highlight in this paper. It is of high interest therefore to develop a parallel, quality PRNG that also works in an on demand model. In this paper we investigate a CPU+GPU hybrid technique to create an efficient PRNG. The basic technique we apply is that of random walks on expander graphs. Unlike existing generators available in the GPU programming environment, our generator can produce random numbers on demand as opposed to a onetime generation. Our approach produces 0.07 GNumbers per second. The quality of our generator is tested with industry standard tests. We also demonstrate two applications of our PRNG. We apply our PRNG to design a list ranking algorithm which demonstrates the on-demand nature of the algorithm and a Monte Carlo simulation which shows the high quality of our generator.","PeriodicalId":378335,"journal":{"name":"2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123604750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

A Massively Parallel Approach for Nonlinear Interdependency Analysis of Multivariate Signals with GPGPU 基于GPGPU的多变量信号非线性相互依赖分析的大规模并行方法

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum Pub Date : 2012-05-21 DOI: 10.1109/IPDPSW.2012.257

Dan Chen, Lizhe Wang, D. Cui, Dongchuan Lu, Xiaoli Li, S. Khan, J. Kolodziej

引用次数: 2

On the Correctness of Mixing Lazy and Eager Version Management in Transactions 论事务中混合惰性和急切版本管理的正确性

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum Pub Date : 2012-05-21 DOI: 10.1109/IPDPSW.2012.320

Lihang Zhao, J. Draper

引用次数: 1

Parallel Multi-Temporal Remote Sensing Image Change Detection on GPU 基于GPU的并行多时相遥感图像变化检测

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum Pub Date : 2012-05-21 DOI: 10.1109/IPDPSW.2012.234

Huming Zhu, Yu Cao, Zhiqiang Zhou, Maoguo Gong

引用次数: 17

Fair Access to External Memory for Chip-multiprocessor 芯片多处理器对外部存储器的公平访问

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum Pub Date : 2012-05-21 DOI: 10.1109/IPDPSW.2012.36

Shufan Yang, Qiang Wu, Xiongren Xiao, Renfa Li, Dominic Hillenbrand

引用次数: 0