2015 IEEE International Parallel and Distributed Processing Symposium Workshop最新文献

筛选
英文 中文
Introducing Tetra: An Educational Parallel Programming System 教育并行编程系统Tetra简介
2015 IEEE International Parallel and Distributed Processing Symposium Workshop Pub Date : 2015-05-25 DOI: 10.1109/IPDPSW.2015.51
Ian Finlayson, Jerome Mueller, S. Rajapakse, Daniel Easterling
{"title":"Introducing Tetra: An Educational Parallel Programming System","authors":"Ian Finlayson, Jerome Mueller, S. Rajapakse, Daniel Easterling","doi":"10.1109/IPDPSW.2015.51","DOIUrl":"https://doi.org/10.1109/IPDPSW.2015.51","url":null,"abstract":"Despite the fact that we are firmly in the multicore era, the use of parallel programming is not as widespread as it could be - in the software industry or in education. There have been many calls to incorporate more parallel programming content into undergraduate computer science education. One obstacle to doing this is that the programming languages most commonly used for parallel programming are detailed, low-level languages such as C, C++, Fortran (with OpenMP or MPI), OpenCL and CUDA. These languages allow programmers to write very efficient code, but that is not so important for those whose goal is to learn the concepts of parallel computing. This paper introduces a parallel programming language called Tetra which provides parallel programming features as first class language features, and also provides garbage collection and is designed to be as simple as possible. Tetra also includes an integrated development environment which is specifically geared for debugging parallel programs and visualizing program execution across multiple threads.","PeriodicalId":340697,"journal":{"name":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131077308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Auto-tuning Non-blocking Collective Communication Operations 自动调优非阻塞集体通信操作
2015 IEEE International Parallel and Distributed Processing Symposium Workshop Pub Date : 2015-05-25 DOI: 10.1109/IPDPSW.2015.15
Youcef Barigou, V. Venkatesan, E. Gabriel
{"title":"Auto-tuning Non-blocking Collective Communication Operations","authors":"Youcef Barigou, V. Venkatesan, E. Gabriel","doi":"10.1109/IPDPSW.2015.15","DOIUrl":"https://doi.org/10.1109/IPDPSW.2015.15","url":null,"abstract":"Collective operations are widely used in large scale scientific applications, and critical to the scalability of these applications for large process counts. It has also been demonstrated that collective operations have to be carefully tuned for a given platform and application scenario to maximize their performance. Non-blocking collective operations extend the concept of collective operations by offering the additional benefit of being able to overlap communication and computation. This paper presents the automatic run-time tuning of non-blocking collective communication operations, which allows the communication library to choose the best performing implementation for a non-blocking collective operation on a case by case basis. The paper demonstrates that libraries using a single algorithm or implementation for a non-blocking collective operation will inevitably lead to suboptimal performance in many scenarios, and thus validate the necessity for run-time tuning of these operations. The benefits of the approach are further demonstrated for an application kernel using a multi-dimensional Fast Fourier Transform. The results obtained for the application scenario indicate a performance improvement of up to 40% compared to the current state of the art.","PeriodicalId":340697,"journal":{"name":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132854904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Streamlining Whole Function Vectorization in C Using Higher Order Vector Semantics 用高阶向量语义在C语言中简化整个函数的向量化
2015 IEEE International Parallel and Distributed Processing Symposium Workshop Pub Date : 2015-05-25 DOI: 10.1109/IPDPSW.2015.37
Gil Rapaport, A. Zaks, Y. Ben-Asher
{"title":"Streamlining Whole Function Vectorization in C Using Higher Order Vector Semantics","authors":"Gil Rapaport, A. Zaks, Y. Ben-Asher","doi":"10.1109/IPDPSW.2015.37","DOIUrl":"https://doi.org/10.1109/IPDPSW.2015.37","url":null,"abstract":"Taking full advantage of SIMD instructions in C programs still requires tedious and non-portable programming using intrinsics, despite considerable efforts spent developing auto-vectorization capabilities in recent decades. Whole Function Vectorization (WFV) is a recent technique for extending the use of SIMD across entire functions. WFV has so far only been used in data-parallel languages such as OpenCL and ISPC. We propose a vector-oriented programming framework that facilitates WFV directly in C. We show that our framework achieves competitive performance to Open CL and ISPC while maintaining C's original syntax and semantics. This allows C programmers to gain better performance for their applications by improving SIMD utilization, without stepping out of C.","PeriodicalId":340697,"journal":{"name":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131731883","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Performance Portable Applications for Hardware Accelerators: Lessons Learned from SPEC ACCEL 硬件加速器的性能便携应用:从specaccel中学到的经验教训
2015 IEEE International Parallel and Distributed Processing Symposium Workshop Pub Date : 2015-05-25 DOI: 10.1109/IPDPSW.2015.26
G. Juckeland, Alexander Grund, W. Nagel
{"title":"Performance Portable Applications for Hardware Accelerators: Lessons Learned from SPEC ACCEL","authors":"G. Juckeland, Alexander Grund, W. Nagel","doi":"10.1109/IPDPSW.2015.26","DOIUrl":"https://doi.org/10.1109/IPDPSW.2015.26","url":null,"abstract":"The popular and diverse hardware accelerator ecosystem makes apples-to-apples comparisons between platforms rather difficult. SPEC ACCEL tries to offer a yardstick to compare different accelerator hardware and software ecosystems. This paper uses this SPEC benchmark to compare an AMD GPU, an NVIDIA GPU and an Intel Xeon Phi with respect to performance and energy consumption. It also provides observations on the performance portability between the different platforms. Since the SPEC ACCEL Open ACC suite can yet not be run on a Xeon Phi, that suite was ported to OpenMP 4.0 target directives to enable a comparison. The challenges and solutions of this porting of 15 applications are described as well.","PeriodicalId":340697,"journal":{"name":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126753912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Bridging the Gap between Performance and Bounds of Cholesky Factorization on Heterogeneous Platforms 在异构平台上弥合Cholesky分解性能和边界之间的差距
2015 IEEE International Parallel and Distributed Processing Symposium Workshop Pub Date : 2015-05-25 DOI: 10.1109/IPDPSW.2015.35
E. Agullo, Olivier Beaumont, Lionel Eyraud-Dubois, J. Herrmann, Suraj Kumar, L. Marchal, Samuel Thibault
{"title":"Bridging the Gap between Performance and Bounds of Cholesky Factorization on Heterogeneous Platforms","authors":"E. Agullo, Olivier Beaumont, Lionel Eyraud-Dubois, J. Herrmann, Suraj Kumar, L. Marchal, Samuel Thibault","doi":"10.1109/IPDPSW.2015.35","DOIUrl":"https://doi.org/10.1109/IPDPSW.2015.35","url":null,"abstract":"We consider the problem of allocating and scheduling dense linear application on fully heterogeneous platforms made of CPUs and GPUs. More specifically, we focus on the Cholesky factorization since it exhibits the main features of such problems. Indeed, the relative performance of CPU and GPU highly depends on the sub-routine: GPUs are for instance much more efficient to process regular kernels such as matrix-matrix multiplications rather than more irregular kernels such as matrix factorization. In this context, one solution consists in relying on dynamic scheduling and resource allocation mechanisms such as the ones provided by PaRSEC or StarPU. In this paper we analyze the performance of dynamic schedulers based on both actual executions and simulations, and we investigate how adding static rules based on an offline analysis of the problem to their decision process can indeed improve their performance, up to reaching some improved theoretical performance bounds which we introduce.","PeriodicalId":340697,"journal":{"name":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126448495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
A Crossbar Interconnection Network in DNA DNA中的交叉棒互连网络
2015 IEEE International Parallel and Distributed Processing Symposium Workshop Pub Date : 2015-05-25 DOI: 10.1109/IPDPSW.2015.103
B. Talawar
{"title":"A Crossbar Interconnection Network in DNA","authors":"B. Talawar","doi":"10.1109/IPDPSW.2015.103","DOIUrl":"https://doi.org/10.1109/IPDPSW.2015.103","url":null,"abstract":"DNA computers provide exciting challenges and opportunities in the fields of computer architecture, neural networks, autonomous micromechanical devices, and chemical reaction networks. The advent of digital abstractions such as the seesaw gates hold many opportunities for computer architects to realize complex digital circuits using DNA strand displacement principles. The paper presents a realization of a single bit, 2×2 crossbar interconnection network built using seesaw gates. The functional correctness of the implemented crossbar was verified using a chemical reaction simulator.","PeriodicalId":340697,"journal":{"name":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122273252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Parallel Asynchronous Modified Newton Methods for Network Flows 网络流的并行异步改进牛顿方法
2015 IEEE International Parallel and Distributed Processing Symposium Workshop Pub Date : 2015-05-25 DOI: 10.1109/IPDPSW.2015.34
D. E. Baz, M. Elkihel
{"title":"Parallel Asynchronous Modified Newton Methods for Network Flows","authors":"D. E. Baz, M. Elkihel","doi":"10.1109/IPDPSW.2015.34","DOIUrl":"https://doi.org/10.1109/IPDPSW.2015.34","url":null,"abstract":"We consider single commodity strictly convex network flow problems. The dual problem is unconstrained, differentiable and well suited for solution via parallel iterative methods. We propose and prove convergence of parallel asynchronous modified Newton algorithms for solving the dual problem. Parallel asynchronous Newton multisplitting algorithms are also considered, their convergence is also shown. A first set of computational results is presented and analyzed.","PeriodicalId":340697,"journal":{"name":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126100108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Cache Support in a High Performance Fault-Tolerant Distributed Storage System for Cloud and Big Data 面向云和大数据的高性能容错分布式存储系统对缓存的支持
2015 IEEE International Parallel and Distributed Processing Symposium Workshop Pub Date : 2015-05-25 DOI: 10.1109/IPDPSW.2015.65
L. Lundberg, Håkan Grahn, D. Ilie, C. Melander
{"title":"Cache Support in a High Performance Fault-Tolerant Distributed Storage System for Cloud and Big Data","authors":"L. Lundberg, Håkan Grahn, D. Ilie, C. Melander","doi":"10.1109/IPDPSW.2015.65","DOIUrl":"https://doi.org/10.1109/IPDPSW.2015.65","url":null,"abstract":"Due to the trends towards Big Data and Cloud Computing, one would like to provide large storage systems that are accessible by many servers. A shared storage can, however, become a performance bottleneck and a single-point of failure. Distributed storage systems provide a shared storage to the outside world, but internally they consist of a network of servers and disks, thus avoiding the performance bottleneck and single-point of failure problems. We introduce a cache in a distributed storage system. The cache system must be fault tolerant so that no data is lost in case of a hardware failure. This requirement excludes the use of the common write-invalidate cache consistency protocols. The cache is implemented and evaluated in two steps. The first step focuses on design decisions that improve the performance when only one server uses the same file. In the second step we extend the cache with features that focus on the case when more than one server access the same file. The cache improves the throughput significantly compared to having no cache. The two-step evaluation approach makes it possible to quantify how different design decisions affect the performance of different use cases.","PeriodicalId":340697,"journal":{"name":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","volume":"266 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116064953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
HPBC Introduction and Committees HPBC介绍和委员会
2015 IEEE International Parallel and Distributed Processing Symposium Workshop Pub Date : 2015-05-25 DOI: 10.1109/IPDPSW.2015.162
E. Aubanel, V. Bhavsar, M. Frumkin
{"title":"HPBC Introduction and Committees","authors":"E. Aubanel, V. Bhavsar, M. Frumkin","doi":"10.1109/IPDPSW.2015.162","DOIUrl":"https://doi.org/10.1109/IPDPSW.2015.162","url":null,"abstract":"HPBC Introduction and Committees","PeriodicalId":340697,"journal":{"name":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115957842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards a Combined Grouping and Aggregation Algorithm for Fast Query Processing in Columnar Databases with GPUs 基于gpu的列数据库快速查询处理组合分组和聚合算法研究
2015 IEEE International Parallel and Distributed Processing Symposium Workshop Pub Date : 2015-05-25 DOI: 10.1109/IPDPSW.2015.21
S. Meraji, John Keenleyside, Sunil Kamath, Bob Blainey
{"title":"Towards a Combined Grouping and Aggregation Algorithm for Fast Query Processing in Columnar Databases with GPUs","authors":"S. Meraji, John Keenleyside, Sunil Kamath, Bob Blainey","doi":"10.1109/IPDPSW.2015.21","DOIUrl":"https://doi.org/10.1109/IPDPSW.2015.21","url":null,"abstract":"Column-store in-memory databases have received a lot of attention because of their fast query processing response times on modern multi-core machines. Among different database operations, group by/aggregate is an important and potentially costly operation. Moreover, sort-based and hash-based algorithms are the most common ways of processing group by/aggregate queries. While sort-based algorithms are used in traditional Data Base Management Systems (DBMS), hash based algorithms can be applied for faster query processing in new columnar databases. Besides, Graphical Processing Units (GPU) can be utilized as fast, high bandwidth co-processors to improve the query processing performance of columnar databases. The focus of this article is on the prototype for group by/aggregate operations that we created to exploit GPUs. We show different hash based algorithms to improve the performance of group by/aggregate operations on GPU. One of the parameters that affect the performance of the group by/aggregate algorithm is the number of groups and hashing algorithm. We show that we can get up to 7.6x improvement in kernel performance compared to a multi-core CPU implementation when we use a partitioned multi-level hash algorithm using GPU shared and global memories.","PeriodicalId":340697,"journal":{"name":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121185162","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信