2014 IEEE International Parallel & Distributed Processing Symposium Workshops最新文献

筛选
英文 中文
SkewControl: Gini Out of the Bottle 扭曲控制:瓶子外的基尼
2014 IEEE International Parallel & Distributed Processing Symposium Workshops Pub Date : 2014-05-19 DOI: 10.1109/IPDPSW.2014.176
Si Zheng, Yunhuai Liu, T. He, Shanshan Li, Xiangke Liao
{"title":"SkewControl: Gini Out of the Bottle","authors":"Si Zheng, Yunhuai Liu, T. He, Shanshan Li, Xiangke Liao","doi":"10.1109/IPDPSW.2014.176","DOIUrl":"https://doi.org/10.1109/IPDPSW.2014.176","url":null,"abstract":"In the age of big data, MapReduce plays an important role in the extreme-scale data processing system. Among all the hot issues, the data skew weights heavily for the MapReduce system performance. In traditional approaches, researchers attempt to leave the users to address the issue which requires the user to possess the application-dependent domain knowledge. Other approaches address the issue automatically but in an open-loop manner which lacks of sufficient adaptivity for different applications. To well address these issues, we conduct trace-driven empirical studies and show that the skew has strong stable and predictable characteristics, which allows us to design a closed-loop automatic mechanism for task partitioning and scheduling, called SkewControl. We implement SkewControl on top of a Hadoop 1.0.4 production system. The experimental results show that compared with the state-of-art LATE and SkewTune systems, SkewControl can consistently improve the system response time by 23.8% and 17% respectively.","PeriodicalId":153864,"journal":{"name":"2014 IEEE International Parallel & Distributed Processing Symposium Workshops","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130619004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Construction of Porous Networks Subjected to Geometric Restrictions by Using OpenMP 基于OpenMP的几何约束多孔网络构造
2014 IEEE International Parallel & Distributed Processing Symposium Workshops Pub Date : 2014-05-19 DOI: 10.1109/IPDPSW.2014.134
A. Mendez, G. Román-Alonso, F. Rojas-González, M. Castro-García, M. Cornejo, Salomón Cordero-Sánchez
{"title":"Construction of Porous Networks Subjected to Geometric Restrictions by Using OpenMP","authors":"A. Mendez, G. Román-Alonso, F. Rojas-González, M. Castro-García, M. Cornejo, Salomón Cordero-Sánchez","doi":"10.1109/IPDPSW.2014.134","DOIUrl":"https://doi.org/10.1109/IPDPSW.2014.134","url":null,"abstract":"The study of porous materials involves great importance for a vast number of industrial applications. In order to study some specific characteristics of materials, in-silico simulations can be employed. The particular simulation of pore networks described in this work finds its basis in the Dual Site-Bond Model (DSBM). Under this approach, a porous material is thought to be made of sites (cavities, bulges) interconnected to each other through bonds (throats, capillaries), while every site is connected to a number of bonds each bond is the link between two sites. At present, several computing algorithms have been implemented for the simulation of pore networks, nevertheless, only a few of these methods take into account the geometric restrictions that arise during the interconnection of a set of bonds to every site of the network. It is likely that introducing restrictions of this sort in the computing algorithms would lead to the implementation of more realistic pore networks. In this work, a sequential algorithm and its parallel computing version are proposed to construct pore networks, allowing geometrical restrictions among hollow entities. Our parallel approach uses OpenMP to create a set of threads (computing tasks) that work simultaneously on independent and random pore network regions. We discuss the obtained results.","PeriodicalId":153864,"journal":{"name":"2014 IEEE International Parallel & Distributed Processing Symposium Workshops","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124259477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Searching for the Optimal Data Partitioning Shape for Parallel Matrix Matrix Multiplication on 3 Heterogeneous Processors 3异构处理器上并行矩阵矩阵乘法的最优数据分区形状搜索
Ashley M. DeFlumere, Alexey L. Lastovetsky
{"title":"Searching for the Optimal Data Partitioning Shape for Parallel Matrix Matrix Multiplication on 3 Heterogeneous Processors","authors":"Ashley M. DeFlumere, Alexey L. Lastovetsky","doi":"10.1109/IPDPSW.2014.8","DOIUrl":"https://doi.org/10.1109/IPDPSW.2014.8","url":null,"abstract":"Parallel Matrix-Matrix Multiplication (MMM) is a fundamental part of the linear algebra libraries used by scientific applications on high performance computers. As heterogeneous systems have emerged as high performance computing platforms, the traditional homogeneous algorithms have been adapted to these heterogeneous environments. Although heterogeneous systems have been in use for some time, it remains an open problem of how to optimally partition data on heterogeneous processors to minimize computation, communication, and execution time. While the question of how to subdivide these MMM problems among heterogeneous processors has been studied, the underlying assumption of this prior study is that the data partition shape, the layout of the data within the matrix assigned to each processor, should be rectangular, i.e. that each processor should be assigned a rectangular portion of the matrix to compute. Our previous work in this area questioned the optimality of this traditional rectangular shape and studied this partition shape problem for two processors. In that work, we proposed a novel mathematical method for transforming partition shapes to decrease communication cost and an analytical technique for determining the optimal shape. In this work, we extend this technique to apply to three and more heterogeneous processors. While applying this method to two processors is relatively straightforward, the complexity grows immensely when considering three processors. With this complexity in mind, we propose a hybrid of experimental and analytical techniques. We postulate that a small number of partition shapes are potentially optimal, and perform extensive testing using a computer aided method to apply our previously developed analytical technique, without finding a counterexample. We identified six data partition shapes which are candidates to be the optimal three processor shape.","PeriodicalId":153864,"journal":{"name":"2014 IEEE International Parallel & Distributed Processing Symposium Workshops","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127036325","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Large Scale Discriminative Metric Learning 大规模判别度量学习
2014 IEEE International Parallel & Distributed Processing Symposium Workshops Pub Date : 2014-05-19 DOI: 10.1109/IPDPSW.2014.181
P. Kirchner, Matthias Boehm, B. Reinwald, D. Sow, J. M. Schmidt, D. Turaga, A. Biem
{"title":"Large Scale Discriminative Metric Learning","authors":"P. Kirchner, Matthias Boehm, B. Reinwald, D. Sow, J. M. Schmidt, D. Turaga, A. Biem","doi":"10.1109/IPDPSW.2014.181","DOIUrl":"https://doi.org/10.1109/IPDPSW.2014.181","url":null,"abstract":"We consider the learning of a distance metric, using the Localized Supervised Metric Learning (LSML) scheme, that discriminates entities characterized by high dimensional feature attributes, with respect to labels assigned to each entity. LSML is a supervised learning scheme that learns a Mahalanobis distance grouping together features with the same label and repulsing features with different labels. In this paper, we propose an efficient and scalable implementation of LSML allowing us to scale significantly and process large data sets, both in terms of dimensions and instances. This implementation of LSML is programmed in SystemML with an R-like syntax, and compiled, optimized, and executed on Hadoop. We also propose experimental approaches for the tuning of LSML parameters yielding significant analytical and empirical improvements in terms of discriminative measures such as label prediction accuracy. We present experimental results on both synthetic and real-world data (feature vectors representing patients in an Intensive Care Unit with labels corresponding to different conditions) assessing respectively how well the algorithm scales and how well it works on real world prediction problems.","PeriodicalId":153864,"journal":{"name":"2014 IEEE International Parallel & Distributed Processing Symposium Workshops","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127755308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
CHIUW Introduction and Committees CHIUW简介及委员会
2014 IEEE International Parallel & Distributed Processing Symposium Workshops Pub Date : 2014-05-19 DOI: 10.1109/IPDPSW.2014.232
B. Chamberlain
{"title":"CHIUW Introduction and Committees","authors":"B. Chamberlain","doi":"10.1109/IPDPSW.2014.232","DOIUrl":"https://doi.org/10.1109/IPDPSW.2014.232","url":null,"abstract":"Background Chapel (http://chapel.cray.com) is an emerging parallel programming language whose design and implementation are being led by Cray Inc. in collaboration with members of computing labs, academia, and industry—both domestically and internationally. Having successfully fulfilled its research objectives under the DARPA High Productivity Computing Systems (HPCS) program that launched it, Chapel is now at the outset of a five-year effort to improve its performance, stability, and utility for real users in the field.","PeriodicalId":153864,"journal":{"name":"2014 IEEE International Parallel & Distributed Processing Symposium Workshops","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133960802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Extracting Maximal Exact Matches on GPU 在GPU上提取最大精确匹配
2014 IEEE International Parallel & Distributed Processing Symposium Workshops Pub Date : 2014-05-19 DOI: 10.1109/IPDPSW.2014.159
Anas Abu-Doleh, K. Kaya, M. Abouelhoda, Ümit V. Çatalyürek
{"title":"Extracting Maximal Exact Matches on GPU","authors":"Anas Abu-Doleh, K. Kaya, M. Abouelhoda, Ümit V. Çatalyürek","doi":"10.1109/IPDPSW.2014.159","DOIUrl":"https://doi.org/10.1109/IPDPSW.2014.159","url":null,"abstract":"The revolution in high-throughput sequencing technologies accelerated the discovery and extraction of various genomic sequences. However, the massive size of the generated datasets raise several computational problems. For example, aligning the sequences or finding the similar regions in them, which is one of the crucial steps in many bioinformatics pipelines, is a time consuming task. Maximal exact matches have been considered important to detect and evaluate the similarity. Most of the existing tools that are designed and developed to find the maximal matches are based on advanced index structures such as suffix tree or array. Although these structures triggered the development of efficient search algorithms, they need large indexing tables which yield large memory footprint for the software using them and bring significant overhead. In this article, we introduce a novel tool GPUMEM which effectively utilizes the massively parallel GPU threads while finding maximal exact matches inside two genome sequences using a lightweight indexing structure. The index construction, which is also handled in GPU, is so fast that even by including the index generation time, GPUMEM can be faster in practice than a state-of-the-art tool that uses a pre-built index.","PeriodicalId":153864,"journal":{"name":"2014 IEEE International Parallel & Distributed Processing Symposium Workshops","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127907430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Hardware/Software Vectorization for Closeness Centrality on Multi-/Many-Core Architectures 多核/多核架构中接近中心性的硬件/软件矢量化
2014 IEEE International Parallel & Distributed Processing Symposium Workshops Pub Date : 2014-05-19 DOI: 10.1109/IPDPSW.2014.156
Ahmet Erdem Sarıyüce, Erik Saule, K. Kaya, Ümit V. Çatalyürek
{"title":"Hardware/Software Vectorization for Closeness Centrality on Multi-/Many-Core Architectures","authors":"Ahmet Erdem Sarıyüce, Erik Saule, K. Kaya, Ümit V. Çatalyürek","doi":"10.1109/IPDPSW.2014.156","DOIUrl":"https://doi.org/10.1109/IPDPSW.2014.156","url":null,"abstract":"Centrality metrics have shown to be highly correlated with the importance and loads of the nodes in a network. Given the scale of today's social networks, it is essential to use efficient algorithms and high performance computing techniques for their fast computation. In this work, we exploit hardware and software vectorization in combination with finegrain parallelization to compute the closeness centrality values. The proposed vectorization approach enables us to do concurrent breadth-first search operations and significantly increases the performance. We provide a comparison of different vectorization schemes and experimentally evaluate our contributions with respect to the existing parallel CPU-based solutions on cutting-edge hardware. Our implementations achieve to be 11 times faster than the state-of-the-art implementation for a graph with 234 million edges. The proposed techniques are beneficial to show how the vectorization can be efficiently utilized to execute other graph kernels that require multiple traversals over a large-scale network on cutting-edge architectures.","PeriodicalId":153864,"journal":{"name":"2014 IEEE International Parallel & Distributed Processing Symposium Workshops","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121294759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
EA: Research-Infused Teaching of Parallel Programming Concepts for Undergraduate Software Engineering Students 软件工程本科学生并行编程概念的研究型教学
2014 IEEE International Parallel & Distributed Processing Symposium Workshops Pub Date : 2014-05-19 DOI: 10.1109/IPDPSW.2014.122
Nasser Giacaman, O. Sinnen
{"title":"EA: Research-Infused Teaching of Parallel Programming Concepts for Undergraduate Software Engineering Students","authors":"Nasser Giacaman, O. Sinnen","doi":"10.1109/IPDPSW.2014.122","DOIUrl":"https://doi.org/10.1109/IPDPSW.2014.122","url":null,"abstract":"This paper presents experience using a research-infused teaching approach towards an undergraduate parallel programming course. The research-teaching nexus is applied at various levels, first by using research-led teaching of core parallel programming concepts, as well as teaching the latest developments from the affiliated research group. The bulk of the course, however, focuses more on the student-driven research-based and research-tutored teaching approaches, where students actively participate in groups on research projects, students are fully immersed in the learning activity of their respective project, while at the same time participating in discussions of wider parallel programming topics across other groups. This intimate affiliation between the undergraduate course and the research group results in a wide range of benefits for all those involved.","PeriodicalId":153864,"journal":{"name":"2014 IEEE International Parallel & Distributed Processing Symposium Workshops","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128469643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
HiPGA: A High Performance Genome Assembler for Short Read Sequence Data HiPGA:一种用于短读序列数据的高性能基因组汇编器
2014 IEEE International Parallel & Distributed Processing Symposium Workshops Pub Date : 2014-05-19 DOI: 10.1109/IPDPSW.2014.68
Xiaohui Duan, Kun Zhao, Weiguo Liu
{"title":"HiPGA: A High Performance Genome Assembler for Short Read Sequence Data","authors":"Xiaohui Duan, Kun Zhao, Weiguo Liu","doi":"10.1109/IPDPSW.2014.68","DOIUrl":"https://doi.org/10.1109/IPDPSW.2014.68","url":null,"abstract":"Emerging next-generation sequencing technologies have opened up exciting new opportunities for genome sequencing by generating read data with a massive throughput. However, the generated reads are significantly shorter compared to the traditional Sanger shotgun sequencing method. This poses challenges for de novo assembly algorithms in terms of both accuracy and efficiency. And due to the continuing explosive growth of short read databases, there is a high demand to accelerate the often repeated long-runtime assembly task. In this paper, we present a scalable parallel algorithm - HiPGA to accelerate the de Bruijn graph-based genome assembly for high-throughput short read data. In order to make full use of the compute power of both shared-memory multi-core CPUs and distributed-memory systems, we have used a parallelized file I/O scheme as well as a hybrid parallelism for the whole assembly pipeline. Evaluations using three real paired-end datasets and the Yoruba individual dataset show that compared to two other well parallelized assemblers: ABySS and PASHA, HiPGA achieves speedups up to 7 while delivering comparable accuracy on 64 CPU cores of a compute cluster.","PeriodicalId":153864,"journal":{"name":"2014 IEEE International Parallel & Distributed Processing Symposium Workshops","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115558425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
XSW: Accelerating Biological Database Search on Xeon Phi XSW:加速Xeon Phi处理器上的生物数据库检索
2014 IEEE International Parallel & Distributed Processing Symposium Workshops Pub Date : 2014-05-19 DOI: 10.1109/IPDPSW.2014.108
Lipeng Wang, Yuandong Chan, Xiaohui Duan, Haidong Lan, Xiangxu Meng, Weiguo Liu
{"title":"XSW: Accelerating Biological Database Search on Xeon Phi","authors":"Lipeng Wang, Yuandong Chan, Xiaohui Duan, Haidong Lan, Xiangxu Meng, Weiguo Liu","doi":"10.1109/IPDPSW.2014.108","DOIUrl":"https://doi.org/10.1109/IPDPSW.2014.108","url":null,"abstract":"In this paper we present XSW, a new parallel Smith-Waterman algorithm for searching protein sequence databases on the Xeon Phi coprocessor. In order to make full use of the compute power of the many-core Xeon Phi hardware, we have used a two-level parallelization scheme: the thread level coarse-grained and VPU level fine-grained parallelism to implement our algorithm. At the thread level, XSW employs multi-threading to implement the SIMD parallelism. At the VPU level, we have used the Knights Corner instructions to gain more data parallelism. We have also reorganized the database and made use of the parallel shuffling operations on Xeon Phi to achieve better I/O efficiency. Evaluations on real protein sequence databases show that XSW achieves the peak performance of 70 GCUPS on a single Intel Xeon Phi 7110 card. Compared to two other well parallelized Smith-Waterman algorithms: the multi-core CPU-based SWIPE and the GPU-based CUDASW++ 3.0, XSW achieves much better performance than SWIPE. And XSW achieves comparable performance but better accuracy than CUDASW++ 3.0. To our knowledge this is the first reported implementation of the Smith-Waterman algorithm on Xeon Phi. The executable binary code of XSW is available at http://sdu-hpcl.github.io/XSW/.","PeriodicalId":153864,"journal":{"name":"2014 IEEE International Parallel & Distributed Processing Symposium Workshops","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114093474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信