2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum最新文献

筛选
英文 中文
Generalizing the Utility of GPUs in Large-Scale Heterogeneous Computing Systems gpu在大规模异构计算系统中的应用
S. Xiao, Wu-chun Feng
{"title":"Generalizing the Utility of GPUs in Large-Scale Heterogeneous Computing Systems","authors":"S. Xiao, Wu-chun Feng","doi":"10.1109/IPDPSW.2012.325","DOIUrl":"https://doi.org/10.1109/IPDPSW.2012.325","url":null,"abstract":"Graphics Processing Units (GPUs) have been widely used as accelerators in large-scale heterogeneous computing systems. However, current programming models can only support the utilization of local GPUs. When using non-local GPUs, programmers need to explicitly call API functions for data communication across computing nodes. As such, programming GPUs in large-scale computing systems is more challenging than local GPUs since local and remote GPUs have to be dealt with separately. In this work, we propose a virtual OpenCL (VOCL) framework to support the transparent virtualization of GPUs. This framework, based on the OpenCL programming model, exposes physical GPUs as decoupled virtual resources that can be transparently managed independent of the application execution. To reduce the virtualization overhead, we optimize the GPU memory accesses and kernel launches. We also extend the VOCL framework to support live task migration across physical GPUs to achieve load balance and/or quick system maintenance. Our experiment results indicate that VOCL can greatly simplify the task of programming cluster-based GPUs at a reasonable virtualization cost.","PeriodicalId":378335,"journal":{"name":"2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum","volume":"171 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133310246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A Simulation Study on Urban Water Threat Detection in Modern Cyberinfrastructures 基于现代网络基础设施的城市水威胁检测仿真研究
Lizhe Wang, Dan Chen, Ze Deng, R. Ranjan
{"title":"A Simulation Study on Urban Water Threat Detection in Modern Cyberinfrastructures","authors":"Lizhe Wang, Dan Chen, Ze Deng, R. Ranjan","doi":"10.1109/IPDPSW.2012.127","DOIUrl":"https://doi.org/10.1109/IPDPSW.2012.127","url":null,"abstract":"The computation of Contaminant Source Characterization (CSC) is a critical research issue in Water Distribution System (WDS) management. We use a simulation framework to identify optimized locations of sensors that lead to fast detection of contamination sources. The optimization engine is based on a Genetic Algorithm (GA) that interprets trial solutions as individuals. During the optimization process many thousands of these solutions are generated. For a large WDS, the calculation of these solutions are non-trivial and time consuming. Hence, it is a compute intensive application that requires significant compute resources. Furthermore, we strive to generate solutions quickly in order to respond to the urgency of a response. To carry out the calculations we require user-level middleware that can be supporting the workflow of the application and manages the resource assignment in an efficient and fault tolerant fashion. To do so we have prototyped the middleware framework that provides a convenient command line and portal layer of steering applications on Grids. Internally, we utilize a sophisticated workflow engine that provides the ability to access elementary fault tolerant mechanisms for job scheduling. This includes the management of job replicas and the reaction on late return of results. We report the test results of CSC problem solving on a real Grid test bed - the Tera Grid test bed. In addition, we contrast this system architecture with a Hadoop-based implementation that automatically includes fault tolerance. The later activity has been conducted on Future Grid.","PeriodicalId":378335,"journal":{"name":"2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum","volume":"190 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134277050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Conflict Avoidance Scheduling Using Grouping List for Transactional Memory 基于分组列表的事务性内存避免冲突调度
Do-Chan Choi, Seung-Hun Kim, W. Ro
{"title":"Conflict Avoidance Scheduling Using Grouping List for Transactional Memory","authors":"Do-Chan Choi, Seung-Hun Kim, W. Ro","doi":"10.1109/IPDPSW.2012.66","DOIUrl":"https://doi.org/10.1109/IPDPSW.2012.66","url":null,"abstract":"Conventional Transactional Memory (TM) systems may experience performance degradation in applications with high contention, given the fact that execution of transaction will frequently restart due to conflicts. The restarting of transaction essentially requires rollback that is a wasteful operation. To address this point, we developed a system to reduce the overhead caused by high contention. In this paper, we present a method called Conflict Avoidance Scheduling (CAS), which prevents the conflicts in high contention by use of conflict characteristic. In CAS, threads that execute transactions which have high probability of conflicts are grouped together. Based on the group information, concurrent execution of threads in the same group is restricted. Therefore, threads that may cause conflict are serially executed. We evaluate the performance of the proposed design by comparing it with Log TM-SE. The simulation results show that our system improves the performance by 23% on an average in applications with high contention, as compared with the conventional Log TM-SE.","PeriodicalId":378335,"journal":{"name":"2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum","volume":"21 7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134555427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A Fast Parallel Implementation of Molecular Dynamics with the Morse Potential on a Heterogeneous Petascale Supercomputer 异构千万亿次超级计算机上分子动力学摩尔斯势的快速并行实现
Qiang Wu, Canqun Yang, Feng Wang, Jingling Xue
{"title":"A Fast Parallel Implementation of Molecular Dynamics with the Morse Potential on a Heterogeneous Petascale Supercomputer","authors":"Qiang Wu, Canqun Yang, Feng Wang, Jingling Xue","doi":"10.1109/IPDPSW.2012.13","DOIUrl":"https://doi.org/10.1109/IPDPSW.2012.13","url":null,"abstract":"Molecular Dynamics (MD) simulations have been widely used in the study of macromolecules. To ensure an acceptable level of statistical accuracy relatively large number of particles are needed, which calls for high performance implementations of MD. These days heterogeneous systems, with their high performance potential, low power consumption, and high price-performance ratio, offer a viable alternative for running MD simulations. In this paper we introduce a fast parallel implementation of MD simulation with the Morse potential on Tianhe-1A, a petascale heterogeneous supercomputer. Our code achieves a speedup of 3.6× on one NVIDIA Tesla M2050 GPU (containing 14 Streaming Multiprocessors) compared to a 2.93GHz six-core Intel Xeon X5670 CPU. In addition, our code runs faster on 1024 compute nodes (with two CPUs and one GPU inside a node) than on 4096 GPU-excluded nodes, effectively rendering one GPU more efficient than six six-core CPUs. Our work shows that large-scale MD simulations can benefit enormously from GPU acceleration in petascale supercomputing platforms. Our performance results are achieved by using (1) a patch-cell design to exploit parallelism across the simulation domain, (2) a new GPU kernel developed by taking advantage of Newton's Third Law to reduce redundant force computation on GPUs, (3) two optimization methods including a dynamic load balancing strategy that adjusts the workload, and a communication overlapping method to overlap the communications between CPUs and GPUs.","PeriodicalId":378335,"journal":{"name":"2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115594582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Inference of Huge Trees under Maximum Likelihood 极大似然下的大树推理
F. Izquierdo-Carrasco, A. Stamatakis
{"title":"Inference of Huge Trees under Maximum Likelihood","authors":"F. Izquierdo-Carrasco, A. Stamatakis","doi":"10.1109/IPDPSW.2012.309","DOIUrl":"https://doi.org/10.1109/IPDPSW.2012.309","url":null,"abstract":"The wide adoption of Next-Generation Sequencing technologies in recent years has generated an avalanche of genetic data, which poses new challenges for large-scale maximum likelihood-based phylogenetic analyses. Improving the scalability of search algorithms and reducing the high memory requirements for computing the likelihood represent major computational challenges in this context. We have introduced methods for solving these key problems and provided respective proof-of-concept implementations. Moreover, we have developed a new tree search strategy that can reduce run times by more than 50% while yielding equally good trees (in the statistical sense). To reduce memory requirements, we explored the applicability of external memory (out-of-core) algorithms as well as a concept that trades memory for additional computations in the likelihood function. The latter concept, only induces a surprisingly small increase in overall execution times. When trading 50% of the required RAM for additional computations, the average execution time increase- because of additional computations-amounts to only 15%. All concepts presented here are sufficiently generic such that they can be applied to all programs that rely on the phylogenetic likelihood function. Thereby, the approaches we have developed will contribute to enable large-scale inferences of whole-genome phylogenies.","PeriodicalId":378335,"journal":{"name":"2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum","volume":"71 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114294008","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Lessons Learned after the Introduction of Parallel and Distributed Computing Concepts into ECE Undergraduate Curricula at UTN-Bahía Blanca Argentina 在UTN-Bahía Blanca Argentina将并行和分布式计算概念引入ECE本科课程后的经验教训
Javier Iparraguirre, G. Friedrich, Ricardo Coppo
{"title":"Lessons Learned after the Introduction of Parallel and Distributed Computing Concepts into ECE Undergraduate Curricula at UTN-Bahía Blanca Argentina","authors":"Javier Iparraguirre, G. Friedrich, Ricardo Coppo","doi":"10.1109/IPDPSW.2012.163","DOIUrl":"https://doi.org/10.1109/IPDPSW.2012.163","url":null,"abstract":"In 2011 we introduced an elective course on Parallel Processing into the ECE undergraduate curricula. UTN Bahía Blanca was one of the first Universities in Argentina that decided to teach OpenCL. During the same year, we also began participation in the NSF/IEEE TCCP 2011 Early Adopters Program. This work summarizes the lessons we learned in our endeavor of teaching parallel and distributed computing concepts. Additionally, it discusses future improvements to our teaching methods and proposes modifications to our initial curricula.","PeriodicalId":378335,"journal":{"name":"2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum","volume":"190 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117336843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A New Task Allocation Algorithm Based on Dynamic Coalition in WSNs 基于动态联盟的无线传感器网络任务分配新算法
Chengyu Chen, Wenzhong Guo, Guolong Chen
{"title":"A New Task Allocation Algorithm Based on Dynamic Coalition in WSNs","authors":"Chengyu Chen, Wenzhong Guo, Guolong Chen","doi":"10.1109/IPDPSW.2012.153","DOIUrl":"https://doi.org/10.1109/IPDPSW.2012.153","url":null,"abstract":"Because nodes in WSNs have limited resources and usually work in a severe dynamic environment without human participation, existing task allocation algorithms in WSNs cannot provide fault-tolerant mechanism. Therefore, a new task allocation algorithm which adopts PSO algorithm and multi-agent technology is proposed by us. The algorithm employs primary/backup copy (PB) technology with backup copy overlapping. The simulation experiment shows the proposed algorithm can effectively improve task guarantee ratio save more energy and prolong the lifetime of network.","PeriodicalId":378335,"journal":{"name":"2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124810475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Compiling C/C++ SIMD Extensions for Function and Loop Vectorizaion on Multicore-SIMD Processors 在多核SIMD处理器上为函数和循环向量化编译C/ c++ SIMD扩展
Xinmin Tian, Hideki Saito, M. Girkar, S. Preis, Sergey Kozhukhov, Aleksei G. Cherkasov, Clark Nelson, Nikolay Panchenko, Robert Y. Geva
{"title":"Compiling C/C++ SIMD Extensions for Function and Loop Vectorizaion on Multicore-SIMD Processors","authors":"Xinmin Tian, Hideki Saito, M. Girkar, S. Preis, Sergey Kozhukhov, Aleksei G. Cherkasov, Clark Nelson, Nikolay Panchenko, Robert Y. Geva","doi":"10.1109/IPDPSW.2012.292","DOIUrl":"https://doi.org/10.1109/IPDPSW.2012.292","url":null,"abstract":"SIMD vectorization has received significant attention in the past decade as an important method to accelerate scientific applications, media and embedded applications on SIMD architectures such as Intel® SSE, AVX, and IBM* AltiVec. However, most of the focus has been directed at loops, effectively executing their iterations on multiple SIMD lanes concurrently relying upon program hints and compiler analysis. This paper presents a set of new C/C++ high-level vector extensions for SIMD programming, and the Intel® C++ product compiler that is extended to translate these vector extensions and produce optimized SIMD instruction sequences of vectorized functions and loops. For a function, our main idea is to vectorize the entire function for callers instead of just vectorizing loops (if any) inside the function. It poses the challenge of dealing with complicated control-flow in the function body, and matching caller and callee for SIMD vector calls while vectorizing caller functions (or loops) and callee functions. Our compilation methods for automatically compiling vector extensions are described. We present performance results of several non-trivial visual computing, computational, and simulation workloads, utilizing SIMD units through the vector extensions on Intel® Multicore 128-bit SIMD processors, and we show that significant SIMD speedups (3.07x to 4.69x) are achieved over the serial execution.","PeriodicalId":378335,"journal":{"name":"2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124828684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
A Server-Level Adaptive Data Layout Strategy for Parallel File Systems 并行文件系统的服务器级自适应数据布局策略
Huaiming Song, Hui Jin, Jun He, Xian-He Sun, R. Thakur
{"title":"A Server-Level Adaptive Data Layout Strategy for Parallel File Systems","authors":"Huaiming Song, Hui Jin, Jun He, Xian-He Sun, R. Thakur","doi":"10.1109/IPDPSW.2012.246","DOIUrl":"https://doi.org/10.1109/IPDPSW.2012.246","url":null,"abstract":"Parallel file systems are widely used for providing a high degree of I/O parallelism to mask the gap between I/O and memory speed. However, peak I/O performance is rarely attained due to complex data access patterns of applications. Based on the observation that the I/O performance of small requests is often limited by the request service rate, and the performance of large requests is limited by I/O bandwidth, we take into consideration both factors and propose a server-level adaptive data layout strategy. The proposed strategy adopts different stripe sizes for different file servers according to the data access characteristics on each individual server. We let the file servers that can fully utilize bandwidth hold more data, and the file servers that are limited with request service rate hold less data. As a result, heavy-load servers can offload some data accesses to light-load servers for potential improvement of I/O performance. We present a method to measure access cost for each data block and then utilize an equal-depth histogram approach to distributed data blocks across multiple servers adaptively, so as to balance data accesses on all file servers. Analytical and experimental results demonstrate that the proposed server-level adaptive layout strategy can improve I/O performance by as much as 80.3% and is more appropriate for applications with complex data access patterns.","PeriodicalId":378335,"journal":{"name":"2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124869028","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Hybrid Differential Evolution Using Low-Discrepancy Sequences for Image Segmentation 基于低差异序列的混合差分进化图像分割
A. Nakib, B. Daachi, P. Siarry
{"title":"Hybrid Differential Evolution Using Low-Discrepancy Sequences for Image Segmentation","authors":"A. Nakib, B. Daachi, P. Siarry","doi":"10.1109/IPDPSW.2012.79","DOIUrl":"https://doi.org/10.1109/IPDPSW.2012.79","url":null,"abstract":"The image thresholding problem can be seen as a problem of optimization of an objective function. Many thresholding techniques have been proposed in the literature and the approximation of normalized histogram of an image by a mixture of Gaussian distributions is one of them. Typically, finding the parameters of Gaussian distributions leads to a nonlinear optimization problem, of which solution is computationally expensive and time-consuming. In this paper, an enhanced version of the classical differential evolution algorithm using low-discrepancy sequences and a local search, called LDE, is used to compute these parameters. Experimental results demonstrate the ability of the algorithm in finding optimal thresholds in case of multilevel thresholding.","PeriodicalId":378335,"journal":{"name":"2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123120819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信