2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum最新文献

筛选
英文 中文
Deriving a Methodology for Code Deployment on Multi-Core Platforms via Iterative Manual Optimizations 基于迭代人工优化的多核平台代码部署方法
Stuart McCool, P. Milligan, P. Sage
{"title":"Deriving a Methodology for Code Deployment on Multi-Core Platforms via Iterative Manual Optimizations","authors":"Stuart McCool, P. Milligan, P. Sage","doi":"10.1109/IPDPSW.2012.178","DOIUrl":"https://doi.org/10.1109/IPDPSW.2012.178","url":null,"abstract":"In recent years, there has been what can only be described as an explosion in the types of processing devices one can expect to find within a given computer system. These include the multi-core CPU, the General Purpose Graphics Processing Unit (GPGPU) and the Accelerated Processing Unit (APU), to name but a few. The widespread uptake of these systems presents would-be users with at least two problems. Firstly, each device exposes a complex underlying architecture which must be appreciated in order to attain optimal performance. This is coupled with the fact that a single system can support an arbitrary number of such devices. Consequently, fully leveraging the performance capabilities of such a system must come at a cost -- increasingly prolonged development times. Adhering to a methodology will have the significant industrial impact of reducing these development times. This paper describes the continued formulation of such a novel methodology. Two real world scientific programs are optimized for execution on the CUDA platform. Double precision accuracy and optimized speedups (which include PCI-E transfer times) of 15x and 17x are achieved.","PeriodicalId":378335,"journal":{"name":"2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133843649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Highly Efficient Consolidated Platform for Stream Computing and Hadoop 一个高效的流计算和Hadoop整合平台
H. Matsuura, Masaru Ganse, T. Suzumura
{"title":"A Highly Efficient Consolidated Platform for Stream Computing and Hadoop","authors":"H. Matsuura, Masaru Ganse, T. Suzumura","doi":"10.1109/IPDPSW.2012.252","DOIUrl":"https://doi.org/10.1109/IPDPSW.2012.252","url":null,"abstract":"Data Stream Processing or stream computing is the new computing paradigm for processing a massive amount of streaming data in real-time without storing them in secondary storage. In this paper we propose an integrated execution platform for Data Stream Processing and Hadoop with dynamic load balancing mechanism to realize an efficient operation of computer systems and reduction of latency of Data Stream Processing. Our implementation is built on top of System S, a distributed data stream processing system developed by IBM Research. Our experimental results show that our load balancing mechanism could increase CPU usage from 47.77% to 72.14% when compared to the one with no load balancing. Moreover, the result shows that latency for stream processing jobs are kept low even in a bursty situation by dynamically allocating more compute resources to stream processing jobs.","PeriodicalId":378335,"journal":{"name":"2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115236921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
An Energy-Optimum and Communication-Time Efficient Protocol for Allocation, Scheduling and Routing in Wireless Networks 一种用于无线网络分配、调度和路由的能量优化和通信时间效率协议
Thiago F. Neves, Marcos F. Caetano, J. Bordim
{"title":"An Energy-Optimum and Communication-Time Efficient Protocol for Allocation, Scheduling and Routing in Wireless Networks","authors":"Thiago F. Neves, Marcos F. Caetano, J. Bordim","doi":"10.1109/IPDPSW.2012.104","DOIUrl":"https://doi.org/10.1109/IPDPSW.2012.104","url":null,"abstract":"The growing demand for mobile wireless access has stimulated the emergence of new communication technologies. Opportunistic Spectrum Access (OSA) is viewed as a promising alternative to overcome the problems caused by static spectrum assignment. Opportunistic access allows dynamic mapping of the transmission needs and communication opportunities. However, performing this task efficiently is not trivial. Indeed, it has been shown to be NP-complete. In this context, this paper presents an efficient heuristic for solving the problem of channel allocation and routing, according to the opportunities and channels available. The proposed heuristic is optimal in terms of energy consumption, being close to the optimum, about 5% above, in terms of transmission time.","PeriodicalId":378335,"journal":{"name":"2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124172325","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
BLOR: Bandwidth and Latency Sensitive Overlay Routing for Flash Data Dissemination BLOR:带宽和延迟敏感覆盖路由闪存数据传播
Xiaoyong Li, Yijie Wang, Yongquan Fu, Xiaoling Li, Weidong Sun
{"title":"BLOR: Bandwidth and Latency Sensitive Overlay Routing for Flash Data Dissemination","authors":"Xiaoyong Li, Yijie Wang, Yongquan Fu, Xiaoling Li, Weidong Sun","doi":"10.1109/IPDPSW.2012.21","DOIUrl":"https://doi.org/10.1109/IPDPSW.2012.21","url":null,"abstract":"Flash data dissemination transmits time-critical data to distributed receivers in a timely manner, which is widely used in many mission-critical applications. However, existing flash data dissemination approaches fail to guarantee the timely transmission due to the unpredictability of the dissemination process. Overlay routing has been widely used as an efficient routing primitive for providing better end-to-end routing quality, based on detouring inefficient routing paths in the P2P network. To improve the predictability of the flash data dissemination process, we propose a bandwidth and latency sensitive overlay routing scheme BLOR, by optimizing the overlay routing and avoiding poor performance of the data dissemination paths. BLOR tries to select optimal paths in terms of latency, bandwidth capacity and available bandwidth in nature, which has never been studied before. Additionally, a location-aware unstructured overlay topology construction scheme is proposed to improve the routing efficiency and data location, as well as an unbiased top-k dominating model is proposed to balance the multi-factor choosing for path selecting, in the optimization process of BLOR. Experimental results with real-world data sets confirm that BLOR significantly improves flash data dissemination.","PeriodicalId":378335,"journal":{"name":"2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114457281","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Heterogeneous Cache Distribution with Reconfigurable Interconnect 具有可重构互连的异构缓存分布
Aishwariya Pattabiraman, A. Avakian, R. Vemuri
{"title":"A Heterogeneous Cache Distribution with Reconfigurable Interconnect","authors":"Aishwariya Pattabiraman, A. Avakian, R. Vemuri","doi":"10.1109/IPDPSW.2012.31","DOIUrl":"https://doi.org/10.1109/IPDPSW.2012.31","url":null,"abstract":"Current trends in multicore research suggest that hundreds of cores will be integrated on a single chip in the near future for increased performance. This new trend presents a set of challenges, one of which is cache distribution among the cores. Network on chip with homogeneous cache distribution among the routers has become mainstream in literature. In this paper, we propose having a heterogeneous distribution of cache blocks to routers. The heterogeneity and the appropriate scheduling by the OS will help to reduce network hops by placing more cache blocks closer to the cores executing data intensive applications. We show that this distribution reduces cache access overhead by as much as 20% percent. Furthermore, we also propose reconfigurable heterogeneous cache architecture for multi-threaded workloads. In this scheme, cache blocks are reassigned to routers based on data needs. A constructive heuristic has been presented which gives the optimal cache configuration and page coloring for each workload. We show that this approach can effectively reduce cache access time by as much as 61% percent.","PeriodicalId":378335,"journal":{"name":"2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117282239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An On-Demand Fast Parallel Pseudo Random Number Generator with Applications 一个按需快速并行伪随机数生成器及其应用
D. Banerjee, A. Bahl, Kishore Kothapalli
{"title":"An On-Demand Fast Parallel Pseudo Random Number Generator with Applications","authors":"D. Banerjee, A. Bahl, Kishore Kothapalli","doi":"10.1109/IPDPSW.2012.212","DOIUrl":"https://doi.org/10.1109/IPDPSW.2012.212","url":null,"abstract":"The use of manycore architectures and accelerators, such as GPUs, with good programmability has allowed them to be deployed for vital computational work. The ability to use randomness in computation is known to help in several situations. For such computations to be made possible on a general purpose computer, a source of randomness, or in general a pseudo random generator (PRNG), is essential. However, most of the PRNGs currently available on GPUs suffer from some basic drawbacks that we highlight in this paper. It is of high interest therefore to develop a parallel, quality PRNG that also works in an on demand model. In this paper we investigate a CPU+GPU hybrid technique to create an efficient PRNG. The basic technique we apply is that of random walks on expander graphs. Unlike existing generators available in the GPU programming environment, our generator can produce random numbers on demand as opposed to a onetime generation. Our approach produces 0.07 GNumbers per second. The quality of our generator is tested with industry standard tests. We also demonstrate two applications of our PRNG. We apply our PRNG to design a list ranking algorithm which demonstrates the on-demand nature of the algorithm and a Monte Carlo simulation which shows the high quality of our generator.","PeriodicalId":378335,"journal":{"name":"2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123604750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A Massively Parallel Approach for Nonlinear Interdependency Analysis of Multivariate Signals with GPGPU 基于GPGPU的多变量信号非线性相互依赖分析的大规模并行方法
Dan Chen, Lizhe Wang, D. Cui, Dongchuan Lu, Xiaoli Li, S. Khan, J. Kolodziej
{"title":"A Massively Parallel Approach for Nonlinear Interdependency Analysis of Multivariate Signals with GPGPU","authors":"Dan Chen, Lizhe Wang, D. Cui, Dongchuan Lu, Xiaoli Li, S. Khan, J. Kolodziej","doi":"10.1109/IPDPSW.2012.257","DOIUrl":"https://doi.org/10.1109/IPDPSW.2012.257","url":null,"abstract":"Nonlinear interdependency (NLI) analysis is an effective method for measurement of synchronization among brain regions, which is an important feature of normal and abnormal brain functions. But its application in practice has long been largely hampered by the ultra-high complexity of the NLI algorithms. We developed a massively parallel approach to address this problem. The approach has dramatically improved the runtime performance. It also enabled NLI analysis on multivariate signals which was previously impossible.","PeriodicalId":378335,"journal":{"name":"2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121899046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
On the Correctness of Mixing Lazy and Eager Version Management in Transactions 论事务中混合惰性和急切版本管理的正确性
Lihang Zhao, J. Draper
{"title":"On the Correctness of Mixing Lazy and Eager Version Management in Transactions","authors":"Lihang Zhao, J. Draper","doi":"10.1109/IPDPSW.2012.320","DOIUrl":"https://doi.org/10.1109/IPDPSW.2012.320","url":null,"abstract":"Transactional memory has been proposed as an optimistic concurrency-control construct to ease parallel programming. Hardware transactional memory (HTM) approaches implement version management and conflict detection in hardware to guarantee the correctness of transaction execution. Based on the style of version management and conflict detection, state-of-the-art HTM systems fall into two main types, namely lazy systems and eager systems. Neither system type is able to always perform better than the other over a wide range of applications due to the broad variations in the execution behaviors in typical parallel workloads. In this paper, we focus on the correctness of mixing lazy and eager version management in a log-based HTM. This hybrid type of version management is demonstrated to satisfy the requirement of atomicity and conflict serializability, which are critical for correctness. Furthermore, it is shown that the new approach does not violate the memory coherence formal model.","PeriodicalId":378335,"journal":{"name":"2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117133153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Parallel Multi-Temporal Remote Sensing Image Change Detection on GPU 基于GPU的并行多时相遥感图像变化检测
Huming Zhu, Yu Cao, Zhiqiang Zhou, Maoguo Gong
{"title":"Parallel Multi-Temporal Remote Sensing Image Change Detection on GPU","authors":"Huming Zhu, Yu Cao, Zhiqiang Zhou, Maoguo Gong","doi":"10.1109/IPDPSW.2012.234","DOIUrl":"https://doi.org/10.1109/IPDPSW.2012.234","url":null,"abstract":"Change detection is an important technique in damage assessment area. As the amount of remote sensing images and the complexity of algorithms rise, the demand for processing power is increasing. In this paper, we propose PLog-FLCM, a parallel algorithm for change detection. It is implemented on AMD Accelerated Parallel Processing (APP) SDK v2 based on Open Computing Language. The parallel characteristics and implementation details of the proposed PLog-FLICM algorithm are presented. Experiments on several Synthetic Aperture Radar(SAR) images demonstrate that the proposed algorithm outperform other algorithms, and the designed parallel algorithm can greatly reduce the computational time of change detection algorithm. It has achieved speedups of between 63 and 145 times on AMD Radeon HD 6870 Graphics Processing Unit (GPU).","PeriodicalId":378335,"journal":{"name":"2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum","volume":"45 10","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120806400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Fair Access to External Memory for Chip-multiprocessor 芯片多处理器对外部存储器的公平访问
Shufan Yang, Qiang Wu, Xiongren Xiao, Renfa Li, Dominic Hillenbrand
{"title":"Fair Access to External Memory for Chip-multiprocessor","authors":"Shufan Yang, Qiang Wu, Xiongren Xiao, Renfa Li, Dominic Hillenbrand","doi":"10.1109/IPDPSW.2012.36","DOIUrl":"https://doi.org/10.1109/IPDPSW.2012.36","url":null,"abstract":"A memory arbitration scheme is required for chip-multiprocessor accessing external (off-chip) memory. A round-robin arbitration is an efficient and low-cost candidate for chip-multiprocessor. However, the adoption of simple round robin scheme raises a potential problem of uneven sharing of external memory bandwidth. One way to manage an unbalanced resource allocation resulting from the use of arbiters is to use priority arbitrations combined with a look-up table, but this is not economical since large lookup tables require a large silicon area. We propose an admission control scheme operating at the edge of on-chip interconnection. The fair-sharing service is guaranteed using a token mechanism to schedule packets onto fabric. This work shows that a fair service problem can be solved by controlling traffic of on-chip interconnect, since it avoid saturation in critical paths and to maintain equilibrium in the allocation of resource.","PeriodicalId":378335,"journal":{"name":"2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128245515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信