2014 IEEE International Parallel & Distributed Processing Symposium Workshops最新文献_第2页

Resource Centered Computing Delivering High Parallel Performance 以资源为中心的计算，提供高并行性能

2014 IEEE International Parallel & Distributed Processing Symposium Workshops Pub Date : 2014-05-19 DOI: 10.1109/IPDPSW.2014.14

J. Gustedt, S. Vialle, P. Mercier

{"title":"Resource Centered Computing Delivering High Parallel Performance","authors":"J. Gustedt, S. Vialle, P. Mercier","doi":"10.1109/IPDPSW.2014.14","DOIUrl":"https://doi.org/10.1109/IPDPSW.2014.14","url":null,"abstract":"Modern parallel programming requires a combination of different paradigms, expertise and tuning, that correspond to the different levels in today's hierarchical architectures. To cope with the inherent difficulty, ORWL (ordered read-write locks) presents a new paradigm and toolbox centered around local or remote resources, such as data, processors or accelerators. ORWL programmers describe their computation in terms of access to these resources during critical sections. Exclusive or shared access to the resources is granted through FIFOs and with read-write semantic. ORWL partially replaces a classical runtime and offers a new API for resource centric parallel programming. We successfully ran an ORWL benchmark application on different parallel architectures (a multicore CPU cluster, a NUMA machine, a CPU+GPU cluster). When processing large data we achieved scalability and performance similar to a reference code built on top of MPI+OpenMP+CUDA. The integration of optimized kernels of scientific computing libraries (ATLAS and cuBLAS) has been almost effortless, and we were able to increase performance using both CPU and GPU cores on our hybrid hierarchical cluster simultaneously. We aim to make ORWL a new easy-to-use and efficient programming model and toolbox for parallel developers.","PeriodicalId":153864,"journal":{"name":"2014 IEEE International Parallel & Distributed Processing Symposium Workshops","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124624966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Acceleration of GPU-Based Ultrasound Simulation via Data Compression 基于gpu的超声仿真数据压缩加速

2014 IEEE International Parallel & Distributed Processing Symposium Workshops Pub Date : 2014-05-19 DOI: 10.1109/IPDPSW.2014.140

Andrew A. Haigh, Eric C. McCreath

{"title":"Acceleration of GPU-Based Ultrasound Simulation via Data Compression","authors":"Andrew A. Haigh, Eric C. McCreath","doi":"10.1109/IPDPSW.2014.140","DOIUrl":"https://doi.org/10.1109/IPDPSW.2014.140","url":null,"abstract":"The realistic simulation of ultrasound wave propagation is computationally intensive. The large size of the grid and low degree of reuse of data means that it places a great demand on memory bandwidth. Graphics Processing Units (GPUs) have attracted attention for performing scientific calculations due to their potential for efficiently performing large numbers of floating point computations. However, many applications may be limited by memory bandwidth, especially for data sets whose size is larger than that of the GPU platform. This problem is only partially mitigated by applying the standard technique of breaking the grid into regions and overlapping the computation of one region with the host-device memory transfer of another. In this paper, we implement a memory-bound GPU-based ultrasound simulation and evaluate the use of a technique for improving performance by compressing the data into a fixed-point representation that reduces the time required for inter-host-device transfers. We demonstrate a speedup of 1.5 times on a simulation where the data is broken into regions that must be copied back and forth between the CPU and GPU. We develop a model that can be used to determine the amount of temporal blocking required to achieve near optimal performance, without extensive experimentation. This technique may also be applied to GPU-based scientific simulations in other domains such as computational fluid dynamics and electromagnetic wave simulation.","PeriodicalId":153864,"journal":{"name":"2014 IEEE International Parallel & Distributed Processing Symposium Workshops","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126672590","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Parallel Heuristics for Scalable Community Detection 可扩展社区检测的并行启发式算法

2014 IEEE International Parallel & Distributed Processing Symposium Workshops Pub Date : 2014-05-19 DOI: 10.1109/IPDPSW.2014.155

Hao Lu, M. Halappanavar, A. Kalyanaraman, Sutanay Choudhury

{"title":"Parallel Heuristics for Scalable Community Detection","authors":"Hao Lu, M. Halappanavar, A. Kalyanaraman, Sutanay Choudhury","doi":"10.1109/IPDPSW.2014.155","DOIUrl":"https://doi.org/10.1109/IPDPSW.2014.155","url":null,"abstract":"Community detection has become a fundamental operation in numerous graph-theoretic applications. It is used to reveal natural divisions that exist within real world networks without imposing prior size or cardinality constraints on the set of communities. Despite its potential for application, there is only limited support for community detection on large-scale parallel computers, largely owing to the irregular and inherently sequential nature of the underlying heuristics. In this paper, we present parallelization heuristics for fast community detection using the Louvain method as the serial template. The Louvain method is an iterative heuristic for modularity optimization. Originally developed by Blondel et al. in 2008, the method has become increasingly popular owing to its ability to detect high modularity community partitions in a fast and memory-efficient manner. However, the method is also inherently sequential, thereby limiting its scalability. Here, we observe certain key properties of this method that present challenges for its parallelization, and consequently propose heuristics that are designed to break the sequential barrier. For evaluation purposes, we implemented our heuristics using OpenMP multithreading, and tested them over real world graphs derived from multiple application domains (e.g., internet, citation, biological). Compared to the serial Louvain implementation, our parallel implementation is able to produce community outputs with a higher modularity for most of the inputs tested, in comparable number of iterations, while providing real speedups of up to 8× using 32 threads. In addition, our parallel implementation was able to exhibit weak scaling properties on up to 32 threads.","PeriodicalId":153864,"journal":{"name":"2014 IEEE International Parallel & Distributed Processing Symposium Workshops","volume":"249 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129034734","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 160

Influence of Magnetic Fields and X-Radiation on Ring Oscillators in FPGAs 磁场和x射线对fpga环形振荡器的影响

2014 IEEE International Parallel & Distributed Processing Symposium Workshops Pub Date : 2014-05-19 DOI: 10.1109/IPDPSW.2014.26

Michael Raitza, Markus Vogt, C. Hochberger, Thilo Pionteck

{"title":"Influence of Magnetic Fields and X-Radiation on Ring Oscillators in FPGAs","authors":"Michael Raitza, Markus Vogt, C. Hochberger, Thilo Pionteck","doi":"10.1109/IPDPSW.2014.26","DOIUrl":"https://doi.org/10.1109/IPDPSW.2014.26","url":null,"abstract":"Cryptographic functions are of increasing importance for all kinds of hardware devices. Their strength against attackers not only relies on the particular cryptographic algorithm but also on the quality of the underlying random number generator. Several techniques have been proposed for implementing true random number generators in digital circuits, yet their immunity against ionising radiation and strong magnetic fields has often not been evaluated. In particular FPGAs seem to be prone to such kinds of attacks, as ionising radiation and magnetic fields may not only influence logic gates but also the configuration memory. In this paper we investigate the influence of X-rays and magnetic fields on three different types of ring oscillators. We conduct experiments with a constant X-ray beam generated by a tungsten radiation source and strong static magnetic fields up to 14 T. We show that both magnetic fields and X-radiation do not have any influence on the amount of entropy generated by the ring oscillators, hence these implementations can be considered safe against such attacks. The random number generators are implemented on Altera Cyclone IV, Lattice LFE3, and Xilinx Spartan 6 FPGAs.","PeriodicalId":153864,"journal":{"name":"2014 IEEE International Parallel & Distributed Processing Symposium Workshops","volume":"392 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123364700","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Bag-of-Task Scheduling on Power-Aware Clusters Using a DVFS-Based Mechanism 基于dvfs机制的功率感知集群任务袋调度

2014 IEEE International Parallel & Distributed Processing Symposium Workshops Pub Date : 2014-05-19 DOI: 10.1109/IPDPSW.2014.95

G. Terzopoulos, H. Karatza

{"title":"Bag-of-Task Scheduling on Power-Aware Clusters Using a DVFS-Based Mechanism","authors":"G. Terzopoulos, H. Karatza","doi":"10.1109/IPDPSW.2014.95","DOIUrl":"https://doi.org/10.1109/IPDPSW.2014.95","url":null,"abstract":"Energy reduction is very important nowadays. A large percentage of the workload submitted to large-scale systems is bag-of-tasks (BoT) applications. Each BoT is a collection of independent tasks that do not communicate with each other. They are used in astronomy, Monte Carlo simulations, data mining, fractal calculations, image processing and massive searches. Due to their importance, BoT scheduling is extensively studied regarding performance. In this paper we view BoT scheduling from an energy efficiency perspective. In order to save energy, we apply a Dynamic Voltage/Frequency Scaling (DVFS) mechanism to a heterogeneous cluster environment where BoTs are submitted. A cluster environment is selected due to the fact that clusters are often used as underlying basic components in grids and clouds. In order for our simulation experiments to be more realistic regarding the workload applied in the system, we also consider high-priority tasks. Extensive simulation experiments show that by applying the proposed DVFS mechanism when BoTs are executed, we can achieve energy savings up to 13% without affecting the execution of high-priority tasks.","PeriodicalId":153864,"journal":{"name":"2014 IEEE International Parallel & Distributed Processing Symposium Workshops","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114339921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

Parallel and Distributed Computing across the Computer Science Curriculum 计算机科学课程中的并行和分布式计算

2014 IEEE International Parallel & Distributed Processing Symposium Workshops Pub Date : 2014-05-19 DOI: 10.1109/IPDPSW.2014.121

D. J. John, Stan J. Thomas

{"title":"Parallel and Distributed Computing across the Computer Science Curriculum","authors":"D. J. John, Stan J. Thomas","doi":"10.1109/IPDPSW.2014.121","DOIUrl":"https://doi.org/10.1109/IPDPSW.2014.121","url":null,"abstract":"Two recent curriculum studies, the ACM/IEEE Curricula 2013 Report and the NSF/IEEE-TCPP Curriculum Initiative on Parallel and Distributed Computing, argue that every undergraduate computer science program should include topics in parallel and distributed computing (PDC). Although not within the scope of these reports, there is also a need for students in computing related general education courses to be aware of the role that parallel and distributed computing technologies play in the computing landscape. One approach to integrating these topics into existing curricula is to spread them across several courses. However, this approach requires development of multiple instructional modules targeted to introduce PDC concepts at specific points in the curriculum. Such modules need to mesh with the goals of the courses for which they are designed in such a way that minimal material has to be removed from existing topics. At the same time the modules should provide students with an understanding of and experience employing fundamental PDC concepts. In this paper we report on our experience developing and deploying such modules.","PeriodicalId":153864,"journal":{"name":"2014 IEEE International Parallel & Distributed Processing Symposium Workshops","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128155409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 19

Over-clocking of Linear Projection Designs through Device Specific Optimisations 通过器件特定优化实现线性投影设计的超频

2014 IEEE International Parallel & Distributed Processing Symposium Workshops Pub Date : 2014-05-19 DOI: 10.1109/IPDPSW.2014.25

R. Duarte, C. Bouganis

引用次数: 3

Training Large Scale Deep Neural Networks on the Intel Xeon Phi Many-Core Coprocessor 在Intel Xeon Phi多核协处理器上训练大规模深度神经网络

2014 IEEE International Parallel & Distributed Processing Symposium Workshops Pub Date : 2014-05-19 DOI: 10.1109/IPDPSW.2014.194

Lei Jin, Zhaokang Wang, Rong Gu, C. Yuan, Y. Huang

{"title":"Training Large Scale Deep Neural Networks on the Intel Xeon Phi Many-Core Coprocessor","authors":"Lei Jin, Zhaokang Wang, Rong Gu, C. Yuan, Y. Huang","doi":"10.1109/IPDPSW.2014.194","DOIUrl":"https://doi.org/10.1109/IPDPSW.2014.194","url":null,"abstract":"As a new area of machine learning research, the deep learning algorithm has attracted a lot of attention from the research community. It may bring human beings to a higher cognitive level of data. Its unsupervised pre-training step allows us to find high-dimensional representations or abstract features which work much better than the principal component analysis (PCA) method. However, it will face problems when being applied to deal with large scale data due to its intensive computation from many levels of training process against large scale data. The sequential deep learning algorithms usually can not finish the computation in an acceptable time. In this paper, we propose a many-core algorithm which is based on a parallel method and is used in the Intel Xeon Phi many-core systems to speed up the unsupervised training process of Sparse Autoencoder and Restricted Boltzmann Machine (RBM). Using the sequential training algorithm as a baseline to compare, we adopted several optimization methods to parallelize the algorithm. The experimental results show that our fully-optimized algorithm gains more than 300-fold speedup on parallelized Sparse Autoencoder compared with the original sequential algorithm on the Intel Xeon Phi coprocessor. Also, we ran the fully-optimized code on both the Intel Xeon Phi coprocessor and an expensive Intel Xeon CPU. Our method on the Intel Xeon Phi coprocessor is 7 to 10 times faster than the Intel Xeon CPU for this application. In addition to this, we compared our fully-optimized code on the Intel Xeon Phi with a Matlab code running on single Intel Xeon CPU. Our method on the Intel Xeon Phi runs 16 times faster than the Matlab implementation. The result also suggests that the Intel Xeon Phi can offer an efficient but more general-purposed way to parallelize the deep learning algorithm compared to GPU. It also achieves faster speed with better parallelism than the Intel Xeon CPU.","PeriodicalId":153864,"journal":{"name":"2014 IEEE International Parallel & Distributed Processing Symposium Workshops","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122007924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 28

A Framework for Customizing Virtual 3-D Reconfigurable Platforms at Run-Time 一种运行时定制虚拟三维可重构平台的框架

2014 IEEE International Parallel & Distributed Processing Symposium Workshops Pub Date : 2014-05-19 DOI: 10.1109/IPDPSW.2014.201

K. Siozios, D. Soudris, M. Hübner

引用次数: 1

Trust-Based Security for the Spanning Tree Protocol 生成树协议基于信任的安全性

2014 IEEE International Parallel & Distributed Processing Symposium Workshops Pub Date : 2014-05-19 DOI: 10.1109/IPDPSW.2014.150

Yingxu Lai, Qiuyue Pan, Zenghui Liu, Yinong Chen, Zhizheng Zhou

引用次数: 2