2014 IEEE International Parallel & Distributed Processing Symposium Workshops最新文献_第8页

Automated Hybrid Interconnect Design for FPGA Accelerators Using Data Communication Profiling 基于数据通信分析的FPGA加速器自动混合互连设计

2014 IEEE International Parallel & Distributed Processing Symposium Workshops Pub Date : 2014-05-19 DOI: 10.1109/IPDPSW.2014.21

C. Pham-Quoc, Z. Al-Ars, K. Bertels

引用次数: 4

HCW 2014 Keynote Talk HCW 2014主题演讲

2014 IEEE International Parallel & Distributed Processing Symposium Workshops Pub Date : 2014-05-19 DOI: 10.1109/IPDPSW.2014.207

D. Abramson

{"title":"HCW 2014 Keynote Talk","authors":"D. Abramson","doi":"10.1109/IPDPSW.2014.207","DOIUrl":"https://doi.org/10.1109/IPDPSW.2014.207","url":null,"abstract":"Summary form only given. CCDB, implements a strategy called \"Comparative Debugging\", which helps trace software errors by comparing two executions of a program at the same time - one code being a reference version and the other faulty. Specifically, users write \"assertions\" that detect when data structure contents in the two executions diverge, and using the dataflow of the code it is possible to locate the source of the divergence. Comparative debugging is effective at finding errors when code is migrated from one platform to another, and this is of significant interest for hybrid computer architectures containing CPUs and accelerators. In this talk I will discuss the design and implementation of CCDB, and show that it operates on highly parallel hybrid CPU/GPU systems. CCDB provides a uniform comparison interface that allows programmers to examine the global runtime status across different types of hybrid programs, including OpenACC and UPC programs. I will present a case study in finding errors using the hybrid version of the stellarator particle simulation DELTA5D, on the Titan machine at ORNL. I will also illustrate that the debugger scales well, and is effective with up to 10,000 nodes and 5,000 GPUs.","PeriodicalId":153864,"journal":{"name":"2014 IEEE International Parallel & Distributed Processing Symposium Workshops","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130811177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Using Patterns to Teach Parallel Computing 使用模式来教授并行计算

2014 IEEE International Parallel & Distributed Processing Symposium Workshops Pub Date : 2014-05-19 DOI: 10.1109/IPDPSW.2014.123

C. Ferner, B. Wilkinson, Barbara P. Heath

{"title":"Using Patterns to Teach Parallel Computing","authors":"C. Ferner, B. Wilkinson, Barbara P. Heath","doi":"10.1109/IPDPSW.2014.123","DOIUrl":"https://doi.org/10.1109/IPDPSW.2014.123","url":null,"abstract":"In this paper, we describe the results of teaching a parallel programming course using a pattern programming approach in a course taught across the State of North Carolina on a televideo network in Fall 2013. Five universities participated in this study. The course begins with a higher-level tool called the Seeds framework that creates and executes high-level message passing patterns such as a workpool without writing low level MPI code. To avoid going directly to MPI next, we used another tool (Paraguin compiler) which uses compiler directives to create MPI code for patterns. Once students understand the pattern programming approach we then present low level MPI routines and their more complex parameters but now with the knowledge of parallel patterns. An independent professional evaluator is employed to deploy survey instruments and produce an analysis of the results. The lessons we learned from this data collected in Fall 2013 are: 1) Teaching parallel computing in the context of patterns has a positive impact on student learning, 2) Teaching the lower level tools first would be beneficial, 3) The improvements made to the Paraguin compiler directives significantly improved the students confidence in using the tool, and 4) The lower level tools can still be taught in the context of patterns.","PeriodicalId":153864,"journal":{"name":"2014 IEEE International Parallel & Distributed Processing Symposium Workshops","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133243415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

SOM Clustering Using Spark-MapReduce 使用Spark-MapReduce的SOM集群

2014 IEEE International Parallel & Distributed Processing Symposium Workshops Pub Date : 2014-05-19 DOI: 10.1109/IPDPSW.2014.192

Tugdual Sarazin, Hanene Azzag, M. Lebbah

引用次数: 29

Runtime Behavior Comparison of Modern Accelerators and Coprocessors 现代加速器和协处理器的运行时行为比较

2014 IEEE International Parallel & Distributed Processing Symposium Workshops Pub Date : 2014-05-19 DOI: 10.1109/IPDPSW.2014.16

Ayman Tarakji, Niels Ole Salscheider

引用次数: 4

GPU Enhanced Path Finding for an Unmanned Aerial Vehicle GPU增强的无人机寻径

2014 IEEE International Parallel & Distributed Processing Symposium Workshops Pub Date : 2014-05-19 DOI: 10.1109/IPDPSW.2014.144

Roksana Hossain, S. Magierowski, G. Messier

引用次数: 5

Autotuning Tensor Transposition 自调谐张量变换

2014 IEEE International Parallel & Distributed Processing Symposium Workshops Pub Date : 2014-05-19 DOI: 10.1109/IPDPSW.2014.43

Lai Wei, J. Mellor-Crummey

引用次数: 9

Evaluation of the Global Address Space Programming Interface (GASPI) 全局地址空间编程接口(GASPI)的评价

2014 IEEE International Parallel & Distributed Processing Symposium Workshops Pub Date : 2014-05-19 DOI: 10.1109/IPDPSW.2014.83

Jens Breitbart, Mareike Schmidtobreick, V. Heuveline

{"title":"Evaluation of the Global Address Space Programming Interface (GASPI)","authors":"Jens Breitbart, Mareike Schmidtobreick, V. Heuveline","doi":"10.1109/IPDPSW.2014.83","DOIUrl":"https://doi.org/10.1109/IPDPSW.2014.83","url":null,"abstract":"The first exascale supercomputers are expected by the end of this decade and will presumably feature an increase in core count, but a decrease in the amount of memory available per core. As of now, it is still unclear if the current programming models will provide high performance on exascale systems. One programming model considered to be an alternative to MPI is the so-called partitioned global address space (PGAS) model. Within this paper we evaluate a relatively new PGAS API: the Global Address Space Programming Interface (GASPI) and compare it to MPI on the basis of microbenchmarks. These benchmarks show that GASPI provides about the same level of performance for single-threaded communication, but is up to an order of magnitude faster than both Intel and IBM MPI for multi-threaded communication. Hereafter, we discuss the different features of GASPI in comparison to two main PGAS languages, namely UPC and CAF. In addition, we present a basic numerical algorithm, a dense matrix-matrix multiplication, as an example on how an implementation can make efficient use of GASPI's features, especially the asynchronous and one-sided communication mechanisms.","PeriodicalId":153864,"journal":{"name":"2014 IEEE International Parallel & Distributed Processing Symposium Workshops","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125794728","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Hierarchical Pipeline Optimization of Coarse Grained Reconfigurable Processor for Multimedia Applications 多媒体应用中粗粒度可重构处理器的分层管道优化

2014 IEEE International Parallel & Distributed Processing Symposium Workshops Pub Date : 2014-05-19 DOI: 10.1109/IPDPSW.2014.38

Chen Mei, Peng Cao, Yang Zhang, Bo Liu, Leibo Liu

{"title":"Hierarchical Pipeline Optimization of Coarse Grained Reconfigurable Processor for Multimedia Applications","authors":"Chen Mei, Peng Cao, Yang Zhang, Bo Liu, Leibo Liu","doi":"10.1109/IPDPSW.2014.38","DOIUrl":"https://doi.org/10.1109/IPDPSW.2014.38","url":null,"abstract":"Nowadays, driven by the consumer demands, the multimedia market is booming and the video coding standards evolve rapidly. A dynamically coarse grain reconfigurable architecture REMUS-II (REconfigurable MUltimedia System 2) is developed as a multi-standards, high resolution, power efficient, and real-time multimedia decoding processor. The hierarchical pipeline is adopted in the REMUS-II for multimedia applications. This paper details the implementation of pipeline optimization techniques for the algorithm and architecture co-design. In each level, the key factors that influence the pipeline performance are analyzed and optimized, including the computational components, the hierarchical memory interfaces, the synchronization mechanisms, and the balanced task assignments. The experimental results show that, compared to original version, the decoding performance of H.264/AVC is improved 2.93 times by the proposed methods. After optimization, the REMUS-II can decode real-time 1080p streams of multi-standards, including H.264/AVC High Profile, MPEG-2 Main Profile, and AVS Jizhun Profile.","PeriodicalId":153864,"journal":{"name":"2014 IEEE International Parallel & Distributed Processing Symposium Workshops","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122197576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Hybrid Multi-elimination ILU Preconditioners on GPUs gpu上的混合多消除ILU预调节器

2014 IEEE International Parallel & Distributed Processing Symposium Workshops Pub Date : 2014-05-19 DOI: 10.1109/IPDPSW.2014.7

D. Lukarski, H. Anzt, S. Tomov, J. Dongarra

引用次数: 3