2015 IEEE International Parallel and Distributed Processing Symposium Workshop最新文献_第3页

Introducing Tetra: An Educational Parallel Programming System 教育并行编程系统Tetra简介

2015 IEEE International Parallel and Distributed Processing Symposium Workshop Pub Date : 2015-05-25 DOI: 10.1109/IPDPSW.2015.51

Ian Finlayson, Jerome Mueller, S. Rajapakse, Daniel Easterling

引用次数: 9

Auto-tuning Non-blocking Collective Communication Operations 自动调优非阻塞集体通信操作

2015 IEEE International Parallel and Distributed Processing Symposium Workshop Pub Date : 2015-05-25 DOI: 10.1109/IPDPSW.2015.15

Youcef Barigou, V. Venkatesan, E. Gabriel

{"title":"Auto-tuning Non-blocking Collective Communication Operations","authors":"Youcef Barigou, V. Venkatesan, E. Gabriel","doi":"10.1109/IPDPSW.2015.15","DOIUrl":"https://doi.org/10.1109/IPDPSW.2015.15","url":null,"abstract":"Collective operations are widely used in large scale scientific applications, and critical to the scalability of these applications for large process counts. It has also been demonstrated that collective operations have to be carefully tuned for a given platform and application scenario to maximize their performance. Non-blocking collective operations extend the concept of collective operations by offering the additional benefit of being able to overlap communication and computation. This paper presents the automatic run-time tuning of non-blocking collective communication operations, which allows the communication library to choose the best performing implementation for a non-blocking collective operation on a case by case basis. The paper demonstrates that libraries using a single algorithm or implementation for a non-blocking collective operation will inevitably lead to suboptimal performance in many scenarios, and thus validate the necessity for run-time tuning of these operations. The benefits of the approach are further demonstrated for an application kernel using a multi-dimensional Fast Fourier Transform. The results obtained for the application scenario indicate a performance improvement of up to 40% compared to the current state of the art.","PeriodicalId":340697,"journal":{"name":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132854904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Streamlining Whole Function Vectorization in C Using Higher Order Vector Semantics 用高阶向量语义在C语言中简化整个函数的向量化

2015 IEEE International Parallel and Distributed Processing Symposium Workshop Pub Date : 2015-05-25 DOI: 10.1109/IPDPSW.2015.37

Gil Rapaport, A. Zaks, Y. Ben-Asher

引用次数: 8

Performance Portable Applications for Hardware Accelerators: Lessons Learned from SPEC ACCEL 硬件加速器的性能便携应用:从specaccel中学到的经验教训

2015 IEEE International Parallel and Distributed Processing Symposium Workshop Pub Date : 2015-05-25 DOI: 10.1109/IPDPSW.2015.26

G. Juckeland, Alexander Grund, W. Nagel

引用次数: 10

Bridging the Gap between Performance and Bounds of Cholesky Factorization on Heterogeneous Platforms 在异构平台上弥合Cholesky分解性能和边界之间的差距

2015 IEEE International Parallel and Distributed Processing Symposium Workshop Pub Date : 2015-05-25 DOI: 10.1109/IPDPSW.2015.35

E. Agullo, Olivier Beaumont, Lionel Eyraud-Dubois, J. Herrmann, Suraj Kumar, L. Marchal, Samuel Thibault

引用次数: 25

A Crossbar Interconnection Network in DNA DNA中的交叉棒互连网络

2015 IEEE International Parallel and Distributed Processing Symposium Workshop Pub Date : 2015-05-25 DOI: 10.1109/IPDPSW.2015.103

B. Talawar

引用次数: 1

Parallel Asynchronous Modified Newton Methods for Network Flows 网络流的并行异步改进牛顿方法

2015 IEEE International Parallel and Distributed Processing Symposium Workshop Pub Date : 2015-05-25 DOI: 10.1109/IPDPSW.2015.34

D. E. Baz, M. Elkihel

引用次数: 1

Cache Support in a High Performance Fault-Tolerant Distributed Storage System for Cloud and Big Data 面向云和大数据的高性能容错分布式存储系统对缓存的支持

2015 IEEE International Parallel and Distributed Processing Symposium Workshop Pub Date : 2015-05-25 DOI: 10.1109/IPDPSW.2015.65

L. Lundberg, Håkan Grahn, D. Ilie, C. Melander

{"title":"Cache Support in a High Performance Fault-Tolerant Distributed Storage System for Cloud and Big Data","authors":"L. Lundberg, Håkan Grahn, D. Ilie, C. Melander","doi":"10.1109/IPDPSW.2015.65","DOIUrl":"https://doi.org/10.1109/IPDPSW.2015.65","url":null,"abstract":"Due to the trends towards Big Data and Cloud Computing, one would like to provide large storage systems that are accessible by many servers. A shared storage can, however, become a performance bottleneck and a single-point of failure. Distributed storage systems provide a shared storage to the outside world, but internally they consist of a network of servers and disks, thus avoiding the performance bottleneck and single-point of failure problems. We introduce a cache in a distributed storage system. The cache system must be fault tolerant so that no data is lost in case of a hardware failure. This requirement excludes the use of the common write-invalidate cache consistency protocols. The cache is implemented and evaluated in two steps. The first step focuses on design decisions that improve the performance when only one server uses the same file. In the second step we extend the cache with features that focus on the case when more than one server access the same file. The cache improves the throughput significantly compared to having no cache. The two-step evaluation approach makes it possible to quantify how different design decisions affect the performance of different use cases.","PeriodicalId":340697,"journal":{"name":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","volume":"266 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116064953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

HPBC Introduction and Committees HPBC介绍和委员会

2015 IEEE International Parallel and Distributed Processing Symposium Workshop Pub Date : 2015-05-25 DOI: 10.1109/IPDPSW.2015.162

E. Aubanel, V. Bhavsar, M. Frumkin

引用次数: 0

Towards a Combined Grouping and Aggregation Algorithm for Fast Query Processing in Columnar Databases with GPUs 基于gpu的列数据库快速查询处理组合分组和聚合算法研究

2015 IEEE International Parallel and Distributed Processing Symposium Workshop Pub Date : 2015-05-25 DOI: 10.1109/IPDPSW.2015.21

S. Meraji, John Keenleyside, Sunil Kamath, Bob Blainey

{"title":"Towards a Combined Grouping and Aggregation Algorithm for Fast Query Processing in Columnar Databases with GPUs","authors":"S. Meraji, John Keenleyside, Sunil Kamath, Bob Blainey","doi":"10.1109/IPDPSW.2015.21","DOIUrl":"https://doi.org/10.1109/IPDPSW.2015.21","url":null,"abstract":"Column-store in-memory databases have received a lot of attention because of their fast query processing response times on modern multi-core machines. Among different database operations, group by/aggregate is an important and potentially costly operation. Moreover, sort-based and hash-based algorithms are the most common ways of processing group by/aggregate queries. While sort-based algorithms are used in traditional Data Base Management Systems (DBMS), hash based algorithms can be applied for faster query processing in new columnar databases. Besides, Graphical Processing Units (GPU) can be utilized as fast, high bandwidth co-processors to improve the query processing performance of columnar databases. The focus of this article is on the prototype for group by/aggregate operations that we created to exploit GPUs. We show different hash based algorithms to improve the performance of group by/aggregate operations on GPU. One of the parameters that affect the performance of the group by/aggregate algorithm is the number of groups and hashing algorithm. We show that we can get up to 7.6x improvement in kernel performance compared to a multi-core CPU implementation when we use a partitioned multi-level hash algorithm using GPU shared and global memories.","PeriodicalId":340697,"journal":{"name":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121185162","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4