2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum最新文献_第7页

Generalizing the Utility of GPUs in Large-Scale Heterogeneous Computing Systems gpu在大规模异构计算系统中的应用

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum Pub Date : 2012-05-21 DOI: 10.1109/IPDPSW.2012.325

S. Xiao, Wu-chun Feng

{"title":"Generalizing the Utility of GPUs in Large-Scale Heterogeneous Computing Systems","authors":"S. Xiao, Wu-chun Feng","doi":"10.1109/IPDPSW.2012.325","DOIUrl":"https://doi.org/10.1109/IPDPSW.2012.325","url":null,"abstract":"Graphics Processing Units (GPUs) have been widely used as accelerators in large-scale heterogeneous computing systems. However, current programming models can only support the utilization of local GPUs. When using non-local GPUs, programmers need to explicitly call API functions for data communication across computing nodes. As such, programming GPUs in large-scale computing systems is more challenging than local GPUs since local and remote GPUs have to be dealt with separately. In this work, we propose a virtual OpenCL (VOCL) framework to support the transparent virtualization of GPUs. This framework, based on the OpenCL programming model, exposes physical GPUs as decoupled virtual resources that can be transparently managed independent of the application execution. To reduce the virtualization overhead, we optimize the GPU memory accesses and kernel launches. We also extend the VOCL framework to support live task migration across physical GPUs to achieve load balance and/or quick system maintenance. Our experiment results indicate that VOCL can greatly simplify the task of programming cluster-based GPUs at a reasonable virtualization cost.","PeriodicalId":378335,"journal":{"name":"2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum","volume":"171 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133310246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

A Simulation Study on Urban Water Threat Detection in Modern Cyberinfrastructures 基于现代网络基础设施的城市水威胁检测仿真研究

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum Pub Date : 2012-05-21 DOI: 10.1109/IPDPSW.2012.127

Lizhe Wang, Dan Chen, Ze Deng, R. Ranjan

{"title":"A Simulation Study on Urban Water Threat Detection in Modern Cyberinfrastructures","authors":"Lizhe Wang, Dan Chen, Ze Deng, R. Ranjan","doi":"10.1109/IPDPSW.2012.127","DOIUrl":"https://doi.org/10.1109/IPDPSW.2012.127","url":null,"abstract":"The computation of Contaminant Source Characterization (CSC) is a critical research issue in Water Distribution System (WDS) management. We use a simulation framework to identify optimized locations of sensors that lead to fast detection of contamination sources. The optimization engine is based on a Genetic Algorithm (GA) that interprets trial solutions as individuals. During the optimization process many thousands of these solutions are generated. For a large WDS, the calculation of these solutions are non-trivial and time consuming. Hence, it is a compute intensive application that requires significant compute resources. Furthermore, we strive to generate solutions quickly in order to respond to the urgency of a response. To carry out the calculations we require user-level middleware that can be supporting the workflow of the application and manages the resource assignment in an efficient and fault tolerant fashion. To do so we have prototyped the middleware framework that provides a convenient command line and portal layer of steering applications on Grids. Internally, we utilize a sophisticated workflow engine that provides the ability to access elementary fault tolerant mechanisms for job scheduling. This includes the management of job replicas and the reaction on late return of results. We report the test results of CSC problem solving on a real Grid test bed - the Tera Grid test bed. In addition, we contrast this system architecture with a Hadoop-based implementation that automatically includes fault tolerance. The later activity has been conducted on Future Grid.","PeriodicalId":378335,"journal":{"name":"2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum","volume":"190 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134277050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Conflict Avoidance Scheduling Using Grouping List for Transactional Memory 基于分组列表的事务性内存避免冲突调度

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum Pub Date : 2012-05-21 DOI: 10.1109/IPDPSW.2012.66

Do-Chan Choi, Seung-Hun Kim, W. Ro

引用次数: 5

A Fast Parallel Implementation of Molecular Dynamics with the Morse Potential on a Heterogeneous Petascale Supercomputer 异构千万亿次超级计算机上分子动力学摩尔斯势的快速并行实现

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum Pub Date : 2012-05-21 DOI: 10.1109/IPDPSW.2012.13

Qiang Wu, Canqun Yang, Feng Wang, Jingling Xue

{"title":"A Fast Parallel Implementation of Molecular Dynamics with the Morse Potential on a Heterogeneous Petascale Supercomputer","authors":"Qiang Wu, Canqun Yang, Feng Wang, Jingling Xue","doi":"10.1109/IPDPSW.2012.13","DOIUrl":"https://doi.org/10.1109/IPDPSW.2012.13","url":null,"abstract":"Molecular Dynamics (MD) simulations have been widely used in the study of macromolecules. To ensure an acceptable level of statistical accuracy relatively large number of particles are needed, which calls for high performance implementations of MD. These days heterogeneous systems, with their high performance potential, low power consumption, and high price-performance ratio, offer a viable alternative for running MD simulations. In this paper we introduce a fast parallel implementation of MD simulation with the Morse potential on Tianhe-1A, a petascale heterogeneous supercomputer. Our code achieves a speedup of 3.6× on one NVIDIA Tesla M2050 GPU (containing 14 Streaming Multiprocessors) compared to a 2.93GHz six-core Intel Xeon X5670 CPU. In addition, our code runs faster on 1024 compute nodes (with two CPUs and one GPU inside a node) than on 4096 GPU-excluded nodes, effectively rendering one GPU more efficient than six six-core CPUs. Our work shows that large-scale MD simulations can benefit enormously from GPU acceleration in petascale supercomputing platforms. Our performance results are achieved by using (1) a patch-cell design to exploit parallelism across the simulation domain, (2) a new GPU kernel developed by taking advantage of Newton's Third Law to reduce redundant force computation on GPUs, (3) two optimization methods including a dynamic load balancing strategy that adjusts the workload, and a communication overlapping method to overlap the communications between CPUs and GPUs.","PeriodicalId":378335,"journal":{"name":"2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115594582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Inference of Huge Trees under Maximum Likelihood 极大似然下的大树推理

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum Pub Date : 2012-05-21 DOI: 10.1109/IPDPSW.2012.309

F. Izquierdo-Carrasco, A. Stamatakis

{"title":"Inference of Huge Trees under Maximum Likelihood","authors":"F. Izquierdo-Carrasco, A. Stamatakis","doi":"10.1109/IPDPSW.2012.309","DOIUrl":"https://doi.org/10.1109/IPDPSW.2012.309","url":null,"abstract":"The wide adoption of Next-Generation Sequencing technologies in recent years has generated an avalanche of genetic data, which poses new challenges for large-scale maximum likelihood-based phylogenetic analyses. Improving the scalability of search algorithms and reducing the high memory requirements for computing the likelihood represent major computational challenges in this context. We have introduced methods for solving these key problems and provided respective proof-of-concept implementations. Moreover, we have developed a new tree search strategy that can reduce run times by more than 50% while yielding equally good trees (in the statistical sense). To reduce memory requirements, we explored the applicability of external memory (out-of-core) algorithms as well as a concept that trades memory for additional computations in the likelihood function. The latter concept, only induces a surprisingly small increase in overall execution times. When trading 50% of the required RAM for additional computations, the average execution time increase- because of additional computations-amounts to only 15%. All concepts presented here are sufficiently generic such that they can be applied to all programs that rely on the phylogenetic likelihood function. Thereby, the approaches we have developed will contribute to enable large-scale inferences of whole-genome phylogenies.","PeriodicalId":378335,"journal":{"name":"2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum","volume":"71 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114294008","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Lessons Learned after the Introduction of Parallel and Distributed Computing Concepts into ECE Undergraduate Curricula at UTN-Bahía Blanca Argentina 在UTN-Bahía Blanca Argentina将并行和分布式计算概念引入ECE本科课程后的经验教训

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum Pub Date : 2012-05-21 DOI: 10.1109/IPDPSW.2012.163

Javier Iparraguirre, G. Friedrich, Ricardo Coppo

引用次数: 5

A New Task Allocation Algorithm Based on Dynamic Coalition in WSNs 基于动态联盟的无线传感器网络任务分配新算法

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum Pub Date : 2012-05-21 DOI: 10.1109/IPDPSW.2012.153

Chengyu Chen, Wenzhong Guo, Guolong Chen

引用次数: 5

Compiling C/C++ SIMD Extensions for Function and Loop Vectorizaion on Multicore-SIMD Processors 在多核SIMD处理器上为函数和循环向量化编译C/ c++ SIMD扩展

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum Pub Date : 2012-05-21 DOI: 10.1109/IPDPSW.2012.292

Xinmin Tian, Hideki Saito, M. Girkar, S. Preis, Sergey Kozhukhov, Aleksei G. Cherkasov, Clark Nelson, Nikolay Panchenko, Robert Y. Geva

{"title":"Compiling C/C++ SIMD Extensions for Function and Loop Vectorizaion on Multicore-SIMD Processors","authors":"Xinmin Tian, Hideki Saito, M. Girkar, S. Preis, Sergey Kozhukhov, Aleksei G. Cherkasov, Clark Nelson, Nikolay Panchenko, Robert Y. Geva","doi":"10.1109/IPDPSW.2012.292","DOIUrl":"https://doi.org/10.1109/IPDPSW.2012.292","url":null,"abstract":"SIMD vectorization has received significant attention in the past decade as an important method to accelerate scientific applications, media and embedded applications on SIMD architectures such as Intel® SSE, AVX, and IBM* AltiVec. However, most of the focus has been directed at loops, effectively executing their iterations on multiple SIMD lanes concurrently relying upon program hints and compiler analysis. This paper presents a set of new C/C++ high-level vector extensions for SIMD programming, and the Intel® C++ product compiler that is extended to translate these vector extensions and produce optimized SIMD instruction sequences of vectorized functions and loops. For a function, our main idea is to vectorize the entire function for callers instead of just vectorizing loops (if any) inside the function. It poses the challenge of dealing with complicated control-flow in the function body, and matching caller and callee for SIMD vector calls while vectorizing caller functions (or loops) and callee functions. Our compilation methods for automatically compiling vector extensions are described. We present performance results of several non-trivial visual computing, computational, and simulation workloads, utilizing SIMD units through the vector extensions on Intel® Multicore 128-bit SIMD processors, and we show that significant SIMD speedups (3.07x to 4.69x) are achieved over the serial execution.","PeriodicalId":378335,"journal":{"name":"2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124828684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 28

A Server-Level Adaptive Data Layout Strategy for Parallel File Systems 并行文件系统的服务器级自适应数据布局策略

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum Pub Date : 2012-05-21 DOI: 10.1109/IPDPSW.2012.246

Huaiming Song, Hui Jin, Jun He, Xian-He Sun, R. Thakur

{"title":"A Server-Level Adaptive Data Layout Strategy for Parallel File Systems","authors":"Huaiming Song, Hui Jin, Jun He, Xian-He Sun, R. Thakur","doi":"10.1109/IPDPSW.2012.246","DOIUrl":"https://doi.org/10.1109/IPDPSW.2012.246","url":null,"abstract":"Parallel file systems are widely used for providing a high degree of I/O parallelism to mask the gap between I/O and memory speed. However, peak I/O performance is rarely attained due to complex data access patterns of applications. Based on the observation that the I/O performance of small requests is often limited by the request service rate, and the performance of large requests is limited by I/O bandwidth, we take into consideration both factors and propose a server-level adaptive data layout strategy. The proposed strategy adopts different stripe sizes for different file servers according to the data access characteristics on each individual server. We let the file servers that can fully utilize bandwidth hold more data, and the file servers that are limited with request service rate hold less data. As a result, heavy-load servers can offload some data accesses to light-load servers for potential improvement of I/O performance. We present a method to measure access cost for each data block and then utilize an equal-depth histogram approach to distributed data blocks across multiple servers adaptively, so as to balance data accesses on all file servers. Analytical and experimental results demonstrate that the proposed server-level adaptive layout strategy can improve I/O performance by as much as 80.3% and is more appropriate for applications with complex data access patterns.","PeriodicalId":378335,"journal":{"name":"2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124869028","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 19

Hybrid Differential Evolution Using Low-Discrepancy Sequences for Image Segmentation 基于低差异序列的混合差分进化图像分割

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum Pub Date : 2012-05-21 DOI: 10.1109/IPDPSW.2012.79

A. Nakib, B. Daachi, P. Siarry

引用次数: 16