2014 IEEE International Parallel & Distributed Processing Symposium Workshops最新文献_第6页

SkewControl: Gini Out of the Bottle 扭曲控制:瓶子外的基尼

2014 IEEE International Parallel & Distributed Processing Symposium Workshops Pub Date : 2014-05-19 DOI: 10.1109/IPDPSW.2014.176

Si Zheng, Yunhuai Liu, T. He, Shanshan Li, Xiangke Liao

引用次数: 2

Construction of Porous Networks Subjected to Geometric Restrictions by Using OpenMP 基于OpenMP的几何约束多孔网络构造

2014 IEEE International Parallel & Distributed Processing Symposium Workshops Pub Date : 2014-05-19 DOI: 10.1109/IPDPSW.2014.134

A. Mendez, G. Román-Alonso, F. Rojas-González, M. Castro-García, M. Cornejo, Salomón Cordero-Sánchez

{"title":"Construction of Porous Networks Subjected to Geometric Restrictions by Using OpenMP","authors":"A. Mendez, G. Román-Alonso, F. Rojas-González, M. Castro-García, M. Cornejo, Salomón Cordero-Sánchez","doi":"10.1109/IPDPSW.2014.134","DOIUrl":"https://doi.org/10.1109/IPDPSW.2014.134","url":null,"abstract":"The study of porous materials involves great importance for a vast number of industrial applications. In order to study some specific characteristics of materials, in-silico simulations can be employed. The particular simulation of pore networks described in this work finds its basis in the Dual Site-Bond Model (DSBM). Under this approach, a porous material is thought to be made of sites (cavities, bulges) interconnected to each other through bonds (throats, capillaries), while every site is connected to a number of bonds each bond is the link between two sites. At present, several computing algorithms have been implemented for the simulation of pore networks, nevertheless, only a few of these methods take into account the geometric restrictions that arise during the interconnection of a set of bonds to every site of the network. It is likely that introducing restrictions of this sort in the computing algorithms would lead to the implementation of more realistic pore networks. In this work, a sequential algorithm and its parallel computing version are proposed to construct pore networks, allowing geometrical restrictions among hollow entities. Our parallel approach uses OpenMP to create a set of threads (computing tasks) that work simultaneously on independent and random pore network regions. We discuss the obtained results.","PeriodicalId":153864,"journal":{"name":"2014 IEEE International Parallel & Distributed Processing Symposium Workshops","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124259477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Searching for the Optimal Data Partitioning Shape for Parallel Matrix Matrix Multiplication on 3 Heterogeneous Processors 3异构处理器上并行矩阵矩阵乘法的最优数据分区形状搜索

2014 IEEE International Parallel & Distributed Processing Symposium Workshops Pub Date : 2014-05-19 DOI: 10.1109/IPDPSW.2014.8

Ashley M. DeFlumere, Alexey L. Lastovetsky

{"title":"Searching for the Optimal Data Partitioning Shape for Parallel Matrix Matrix Multiplication on 3 Heterogeneous Processors","authors":"Ashley M. DeFlumere, Alexey L. Lastovetsky","doi":"10.1109/IPDPSW.2014.8","DOIUrl":"https://doi.org/10.1109/IPDPSW.2014.8","url":null,"abstract":"Parallel Matrix-Matrix Multiplication (MMM) is a fundamental part of the linear algebra libraries used by scientific applications on high performance computers. As heterogeneous systems have emerged as high performance computing platforms, the traditional homogeneous algorithms have been adapted to these heterogeneous environments. Although heterogeneous systems have been in use for some time, it remains an open problem of how to optimally partition data on heterogeneous processors to minimize computation, communication, and execution time. While the question of how to subdivide these MMM problems among heterogeneous processors has been studied, the underlying assumption of this prior study is that the data partition shape, the layout of the data within the matrix assigned to each processor, should be rectangular, i.e. that each processor should be assigned a rectangular portion of the matrix to compute. Our previous work in this area questioned the optimality of this traditional rectangular shape and studied this partition shape problem for two processors. In that work, we proposed a novel mathematical method for transforming partition shapes to decrease communication cost and an analytical technique for determining the optimal shape. In this work, we extend this technique to apply to three and more heterogeneous processors. While applying this method to two processors is relatively straightforward, the complexity grows immensely when considering three processors. With this complexity in mind, we propose a hybrid of experimental and analytical techniques. We postulate that a small number of partition shapes are potentially optimal, and perform extensive testing using a computer aided method to apply our previously developed analytical technique, without finding a counterexample. We identified six data partition shapes which are candidates to be the optimal three processor shape.","PeriodicalId":153864,"journal":{"name":"2014 IEEE International Parallel & Distributed Processing Symposium Workshops","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127036325","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

Large Scale Discriminative Metric Learning 大规模判别度量学习

2014 IEEE International Parallel & Distributed Processing Symposium Workshops Pub Date : 2014-05-19 DOI: 10.1109/IPDPSW.2014.181

P. Kirchner, Matthias Boehm, B. Reinwald, D. Sow, J. M. Schmidt, D. Turaga, A. Biem

{"title":"Large Scale Discriminative Metric Learning","authors":"P. Kirchner, Matthias Boehm, B. Reinwald, D. Sow, J. M. Schmidt, D. Turaga, A. Biem","doi":"10.1109/IPDPSW.2014.181","DOIUrl":"https://doi.org/10.1109/IPDPSW.2014.181","url":null,"abstract":"We consider the learning of a distance metric, using the Localized Supervised Metric Learning (LSML) scheme, that discriminates entities characterized by high dimensional feature attributes, with respect to labels assigned to each entity. LSML is a supervised learning scheme that learns a Mahalanobis distance grouping together features with the same label and repulsing features with different labels. In this paper, we propose an efficient and scalable implementation of LSML allowing us to scale significantly and process large data sets, both in terms of dimensions and instances. This implementation of LSML is programmed in SystemML with an R-like syntax, and compiled, optimized, and executed on Hadoop. We also propose experimental approaches for the tuning of LSML parameters yielding significant analytical and empirical improvements in terms of discriminative measures such as label prediction accuracy. We present experimental results on both synthetic and real-world data (feature vectors representing patients in an Intensive Care Unit with labels corresponding to different conditions) assessing respectively how well the algorithm scales and how well it works on real world prediction problems.","PeriodicalId":153864,"journal":{"name":"2014 IEEE International Parallel & Distributed Processing Symposium Workshops","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127755308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

CHIUW Introduction and Committees CHIUW简介及委员会

2014 IEEE International Parallel & Distributed Processing Symposium Workshops Pub Date : 2014-05-19 DOI: 10.1109/IPDPSW.2014.232

B. Chamberlain

引用次数: 0

Extracting Maximal Exact Matches on GPU 在GPU上提取最大精确匹配

2014 IEEE International Parallel & Distributed Processing Symposium Workshops Pub Date : 2014-05-19 DOI: 10.1109/IPDPSW.2014.159

Anas Abu-Doleh, K. Kaya, M. Abouelhoda, Ümit V. Çatalyürek

{"title":"Extracting Maximal Exact Matches on GPU","authors":"Anas Abu-Doleh, K. Kaya, M. Abouelhoda, Ümit V. Çatalyürek","doi":"10.1109/IPDPSW.2014.159","DOIUrl":"https://doi.org/10.1109/IPDPSW.2014.159","url":null,"abstract":"The revolution in high-throughput sequencing technologies accelerated the discovery and extraction of various genomic sequences. However, the massive size of the generated datasets raise several computational problems. For example, aligning the sequences or finding the similar regions in them, which is one of the crucial steps in many bioinformatics pipelines, is a time consuming task. Maximal exact matches have been considered important to detect and evaluate the similarity. Most of the existing tools that are designed and developed to find the maximal matches are based on advanced index structures such as suffix tree or array. Although these structures triggered the development of efficient search algorithms, they need large indexing tables which yield large memory footprint for the software using them and bring significant overhead. In this article, we introduce a novel tool GPUMEM which effectively utilizes the massively parallel GPU threads while finding maximal exact matches inside two genome sequences using a lightweight indexing structure. The index construction, which is also handled in GPU, is so fast that even by including the index generation time, GPUMEM can be faster in practice than a state-of-the-art tool that uses a pre-built index.","PeriodicalId":153864,"journal":{"name":"2014 IEEE International Parallel & Distributed Processing Symposium Workshops","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127907430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Hardware/Software Vectorization for Closeness Centrality on Multi-/Many-Core Architectures 多核/多核架构中接近中心性的硬件/软件矢量化

2014 IEEE International Parallel & Distributed Processing Symposium Workshops Pub Date : 2014-05-19 DOI: 10.1109/IPDPSW.2014.156

Ahmet Erdem Sarıyüce, Erik Saule, K. Kaya, Ümit V. Çatalyürek

{"title":"Hardware/Software Vectorization for Closeness Centrality on Multi-/Many-Core Architectures","authors":"Ahmet Erdem Sarıyüce, Erik Saule, K. Kaya, Ümit V. Çatalyürek","doi":"10.1109/IPDPSW.2014.156","DOIUrl":"https://doi.org/10.1109/IPDPSW.2014.156","url":null,"abstract":"Centrality metrics have shown to be highly correlated with the importance and loads of the nodes in a network. Given the scale of today's social networks, it is essential to use efficient algorithms and high performance computing techniques for their fast computation. In this work, we exploit hardware and software vectorization in combination with finegrain parallelization to compute the closeness centrality values. The proposed vectorization approach enables us to do concurrent breadth-first search operations and significantly increases the performance. We provide a comparison of different vectorization schemes and experimentally evaluate our contributions with respect to the existing parallel CPU-based solutions on cutting-edge hardware. Our implementations achieve to be 11 times faster than the state-of-the-art implementation for a graph with 234 million edges. The proposed techniques are beneficial to show how the vectorization can be efficiently utilized to execute other graph kernels that require multiple traversals over a large-scale network on cutting-edge architectures.","PeriodicalId":153864,"journal":{"name":"2014 IEEE International Parallel & Distributed Processing Symposium Workshops","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121294759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

EA: Research-Infused Teaching of Parallel Programming Concepts for Undergraduate Software Engineering Students 软件工程本科学生并行编程概念的研究型教学

2014 IEEE International Parallel & Distributed Processing Symposium Workshops Pub Date : 2014-05-19 DOI: 10.1109/IPDPSW.2014.122

Nasser Giacaman, O. Sinnen

引用次数: 7

HiPGA: A High Performance Genome Assembler for Short Read Sequence Data HiPGA:一种用于短读序列数据的高性能基因组汇编器

2014 IEEE International Parallel & Distributed Processing Symposium Workshops Pub Date : 2014-05-19 DOI: 10.1109/IPDPSW.2014.68

Xiaohui Duan, Kun Zhao, Weiguo Liu

{"title":"HiPGA: A High Performance Genome Assembler for Short Read Sequence Data","authors":"Xiaohui Duan, Kun Zhao, Weiguo Liu","doi":"10.1109/IPDPSW.2014.68","DOIUrl":"https://doi.org/10.1109/IPDPSW.2014.68","url":null,"abstract":"Emerging next-generation sequencing technologies have opened up exciting new opportunities for genome sequencing by generating read data with a massive throughput. However, the generated reads are significantly shorter compared to the traditional Sanger shotgun sequencing method. This poses challenges for de novo assembly algorithms in terms of both accuracy and efficiency. And due to the continuing explosive growth of short read databases, there is a high demand to accelerate the often repeated long-runtime assembly task. In this paper, we present a scalable parallel algorithm - HiPGA to accelerate the de Bruijn graph-based genome assembly for high-throughput short read data. In order to make full use of the compute power of both shared-memory multi-core CPUs and distributed-memory systems, we have used a parallelized file I/O scheme as well as a hybrid parallelism for the whole assembly pipeline. Evaluations using three real paired-end datasets and the Yoruba individual dataset show that compared to two other well parallelized assemblers: ABySS and PASHA, HiPGA achieves speedups up to 7 while delivering comparable accuracy on 64 CPU cores of a compute cluster.","PeriodicalId":153864,"journal":{"name":"2014 IEEE International Parallel & Distributed Processing Symposium Workshops","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115558425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

XSW: Accelerating Biological Database Search on Xeon Phi XSW:加速Xeon Phi处理器上的生物数据库检索

2014 IEEE International Parallel & Distributed Processing Symposium Workshops Pub Date : 2014-05-19 DOI: 10.1109/IPDPSW.2014.108

Lipeng Wang, Yuandong Chan, Xiaohui Duan, Haidong Lan, Xiangxu Meng, Weiguo Liu

{"title":"XSW: Accelerating Biological Database Search on Xeon Phi","authors":"Lipeng Wang, Yuandong Chan, Xiaohui Duan, Haidong Lan, Xiangxu Meng, Weiguo Liu","doi":"10.1109/IPDPSW.2014.108","DOIUrl":"https://doi.org/10.1109/IPDPSW.2014.108","url":null,"abstract":"In this paper we present XSW, a new parallel Smith-Waterman algorithm for searching protein sequence databases on the Xeon Phi coprocessor. In order to make full use of the compute power of the many-core Xeon Phi hardware, we have used a two-level parallelization scheme: the thread level coarse-grained and VPU level fine-grained parallelism to implement our algorithm. At the thread level, XSW employs multi-threading to implement the SIMD parallelism. At the VPU level, we have used the Knights Corner instructions to gain more data parallelism. We have also reorganized the database and made use of the parallel shuffling operations on Xeon Phi to achieve better I/O efficiency. Evaluations on real protein sequence databases show that XSW achieves the peak performance of 70 GCUPS on a single Intel Xeon Phi 7110 card. Compared to two other well parallelized Smith-Waterman algorithms: the multi-core CPU-based SWIPE and the GPU-based CUDASW++ 3.0, XSW achieves much better performance than SWIPE. And XSW achieves comparable performance but better accuracy than CUDASW++ 3.0. To our knowledge this is the first reported implementation of the Smith-Waterman algorithm on Xeon Phi. The executable binary code of XSW is available at http://sdu-hpcl.github.io/XSW/.","PeriodicalId":153864,"journal":{"name":"2014 IEEE International Parallel & Distributed Processing Symposium Workshops","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114093474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 30