2014 IEEE International Parallel & Distributed Processing Symposium Workshops最新文献

筛选
英文 中文
Automated Hybrid Interconnect Design for FPGA Accelerators Using Data Communication Profiling 基于数据通信分析的FPGA加速器自动混合互连设计
2014 IEEE International Parallel & Distributed Processing Symposium Workshops Pub Date : 2014-05-19 DOI: 10.1109/IPDPSW.2014.21
C. Pham-Quoc, Z. Al-Ars, K. Bertels
{"title":"Automated Hybrid Interconnect Design for FPGA Accelerators Using Data Communication Profiling","authors":"C. Pham-Quoc, Z. Al-Ars, K. Bertels","doi":"10.1109/IPDPSW.2014.21","DOIUrl":"https://doi.org/10.1109/IPDPSW.2014.21","url":null,"abstract":"In this paper, we introduce an automated interconnect design strategy to create an efficient custom interconnect for kernels in an FPGA-based accelerator system to accelerate their communication behavior. Our custom interconnect includes an NoC, shared local memory solution or both. Depending on the quantitative communication profiling of the application, the interconnect is built using our proposed custom interconnect design algorithm and adaptive mapping function. Experimental results show that our system achieves an overall application speed-up of 3.72× compared to software and of 2.87× compared to the baseline system - a conventional FPGA bus-based accelerator system. Moreover, our proposed system achieves 66.5% energy reduction due to the reduced execution time.","PeriodicalId":153864,"journal":{"name":"2014 IEEE International Parallel & Distributed Processing Symposium Workshops","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129571296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
HCW 2014 Keynote Talk HCW 2014主题演讲
2014 IEEE International Parallel & Distributed Processing Symposium Workshops Pub Date : 2014-05-19 DOI: 10.1109/IPDPSW.2014.207
D. Abramson
{"title":"HCW 2014 Keynote Talk","authors":"D. Abramson","doi":"10.1109/IPDPSW.2014.207","DOIUrl":"https://doi.org/10.1109/IPDPSW.2014.207","url":null,"abstract":"Summary form only given. CCDB, implements a strategy called \"Comparative Debugging\", which helps trace software errors by comparing two executions of a program at the same time - one code being a reference version and the other faulty. Specifically, users write \"assertions\" that detect when data structure contents in the two executions diverge, and using the dataflow of the code it is possible to locate the source of the divergence. Comparative debugging is effective at finding errors when code is migrated from one platform to another, and this is of significant interest for hybrid computer architectures containing CPUs and accelerators. In this talk I will discuss the design and implementation of CCDB, and show that it operates on highly parallel hybrid CPU/GPU systems. CCDB provides a uniform comparison interface that allows programmers to examine the global runtime status across different types of hybrid programs, including OpenACC and UPC programs. I will present a case study in finding errors using the hybrid version of the stellarator particle simulation DELTA5D, on the Titan machine at ORNL. I will also illustrate that the debugger scales well, and is effective with up to 10,000 nodes and 5,000 GPUs.","PeriodicalId":153864,"journal":{"name":"2014 IEEE International Parallel & Distributed Processing Symposium Workshops","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130811177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Using Patterns to Teach Parallel Computing 使用模式来教授并行计算
2014 IEEE International Parallel & Distributed Processing Symposium Workshops Pub Date : 2014-05-19 DOI: 10.1109/IPDPSW.2014.123
C. Ferner, B. Wilkinson, Barbara P. Heath
{"title":"Using Patterns to Teach Parallel Computing","authors":"C. Ferner, B. Wilkinson, Barbara P. Heath","doi":"10.1109/IPDPSW.2014.123","DOIUrl":"https://doi.org/10.1109/IPDPSW.2014.123","url":null,"abstract":"In this paper, we describe the results of teaching a parallel programming course using a pattern programming approach in a course taught across the State of North Carolina on a televideo network in Fall 2013. Five universities participated in this study. The course begins with a higher-level tool called the Seeds framework that creates and executes high-level message passing patterns such as a workpool without writing low level MPI code. To avoid going directly to MPI next, we used another tool (Paraguin compiler) which uses compiler directives to create MPI code for patterns. Once students understand the pattern programming approach we then present low level MPI routines and their more complex parameters but now with the knowledge of parallel patterns. An independent professional evaluator is employed to deploy survey instruments and produce an analysis of the results. The lessons we learned from this data collected in Fall 2013 are: 1) Teaching parallel computing in the context of patterns has a positive impact on student learning, 2) Teaching the lower level tools first would be beneficial, 3) The improvements made to the Paraguin compiler directives significantly improved the students confidence in using the tool, and 4) The lower level tools can still be taught in the context of patterns.","PeriodicalId":153864,"journal":{"name":"2014 IEEE International Parallel & Distributed Processing Symposium Workshops","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133243415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
SOM Clustering Using Spark-MapReduce 使用Spark-MapReduce的SOM集群
2014 IEEE International Parallel & Distributed Processing Symposium Workshops Pub Date : 2014-05-19 DOI: 10.1109/IPDPSW.2014.192
Tugdual Sarazin, Hanene Azzag, M. Lebbah
{"title":"SOM Clustering Using Spark-MapReduce","authors":"Tugdual Sarazin, Hanene Azzag, M. Lebbah","doi":"10.1109/IPDPSW.2014.192","DOIUrl":"https://doi.org/10.1109/IPDPSW.2014.192","url":null,"abstract":"In this paper, we consider designing clustering algorithms that can be used in MapReduce using Spark platform, one of the most popular programming environment for processing large datasets. We focus on the practical and popular serial Self-organizing Map clustering algorithm (SOM). SOM is one of the famous unsupervised learning algorithms and it's useful for cluster analysis of large quantities of data. We have designed two scalable implementations of SOM-MapReduce algorithm. We report the experiments and demonstrated the performance in terms of classification accuracy, rand, speedup using real and synthetic data with 100 millions of points, using different cores.","PeriodicalId":153864,"journal":{"name":"2014 IEEE International Parallel & Distributed Processing Symposium Workshops","volume":"299 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114389651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
Runtime Behavior Comparison of Modern Accelerators and Coprocessors 现代加速器和协处理器的运行时行为比较
2014 IEEE International Parallel & Distributed Processing Symposium Workshops Pub Date : 2014-05-19 DOI: 10.1109/IPDPSW.2014.16
Ayman Tarakji, Niels Ole Salscheider
{"title":"Runtime Behavior Comparison of Modern Accelerators and Coprocessors","authors":"Ayman Tarakji, Niels Ole Salscheider","doi":"10.1109/IPDPSW.2014.16","DOIUrl":"https://doi.org/10.1109/IPDPSW.2014.16","url":null,"abstract":"Recently, a variety of accelerator architectures became available in the field of high performance computing. Intel's MIC (Many Integrated Core) and both GPU architectures, NVIDIA's Kepler and AMD's Graphics Core Next, all represent the latest innovation in the field of general purpose computing accelerators. This paper explores several important characteristics of these architectures and investigates the impact of certain design factors on the achieved performance using the uCLbench micro-benchmarks, the NPB (NAS Parallel Benchmark) suite and diverse real-world applications from the field of physics. Based on the single unified programming interface OpenCL, we observe the run-time behavior of each test program on several test platforms. Major architectural discrepancies are studied and a higher level examination is discussed in details.","PeriodicalId":153864,"journal":{"name":"2014 IEEE International Parallel & Distributed Processing Symposium Workshops","volume":"183 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134460697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
GPU Enhanced Path Finding for an Unmanned Aerial Vehicle GPU增强的无人机寻径
2014 IEEE International Parallel & Distributed Processing Symposium Workshops Pub Date : 2014-05-19 DOI: 10.1109/IPDPSW.2014.144
Roksana Hossain, S. Magierowski, G. Messier
{"title":"GPU Enhanced Path Finding for an Unmanned Aerial Vehicle","authors":"Roksana Hossain, S. Magierowski, G. Messier","doi":"10.1109/IPDPSW.2014.144","DOIUrl":"https://doi.org/10.1109/IPDPSW.2014.144","url":null,"abstract":"Situated robots like unmanned aerial vehicles (UAVs) typically need to arrange their plans as a sequence of actions between multiple goal locations. Identifying the sequence of goals to plan for can be naturally cast in the form of the traveling salesman problem (TSP). By making faster decision, more complex real-time operations may be achieved. A graphics processing unit (GPU) is used in this work to enhance the computational execution rate. A genetic algorithm working in concert with a clustering algorithm is used to quickly compute the desired routes. Several algorithm customizations are made to address the GPU's limited memory space. The implemented GPU code works 4.8 times faster than serially implemented code and the algorithm can solve large problems with 4000 waypoints.","PeriodicalId":153864,"journal":{"name":"2014 IEEE International Parallel & Distributed Processing Symposium Workshops","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116146574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Autotuning Tensor Transposition 自调谐张量变换
2014 IEEE International Parallel & Distributed Processing Symposium Workshops Pub Date : 2014-05-19 DOI: 10.1109/IPDPSW.2014.43
Lai Wei, J. Mellor-Crummey
{"title":"Autotuning Tensor Transposition","authors":"Lai Wei, J. Mellor-Crummey","doi":"10.1109/IPDPSW.2014.43","DOIUrl":"https://doi.org/10.1109/IPDPSW.2014.43","url":null,"abstract":"Tensor transposition, a generalization of matrix transposition, is an important primitive used when performing tensor contraction. Efficient implementation of tensor transposition for modern node architectures depends on various architecture capabilities such as cache and memory hierarchy, threads, and SIMD parallelism. This paper introduces a framework that uses static analysis and empirical autotuning to produce optimized parallel tensor transposition code for node architectures using a rule-based code generation and transformation system. By exploring various optimization techniques with different settings, our framework achieves more than 80% of the bandwidth of memcpy for tensors on two very different node architectures, one a dual-socket system with Intel Westmere processors and the other a quad-socket system with IBM Power7 processors.","PeriodicalId":153864,"journal":{"name":"2014 IEEE International Parallel & Distributed Processing Symposium Workshops","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116614708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Evaluation of the Global Address Space Programming Interface (GASPI) 全局地址空间编程接口(GASPI)的评价
2014 IEEE International Parallel & Distributed Processing Symposium Workshops Pub Date : 2014-05-19 DOI: 10.1109/IPDPSW.2014.83
Jens Breitbart, Mareike Schmidtobreick, V. Heuveline
{"title":"Evaluation of the Global Address Space Programming Interface (GASPI)","authors":"Jens Breitbart, Mareike Schmidtobreick, V. Heuveline","doi":"10.1109/IPDPSW.2014.83","DOIUrl":"https://doi.org/10.1109/IPDPSW.2014.83","url":null,"abstract":"The first exascale supercomputers are expected by the end of this decade and will presumably feature an increase in core count, but a decrease in the amount of memory available per core. As of now, it is still unclear if the current programming models will provide high performance on exascale systems. One programming model considered to be an alternative to MPI is the so-called partitioned global address space (PGAS) model. Within this paper we evaluate a relatively new PGAS API: the Global Address Space Programming Interface (GASPI) and compare it to MPI on the basis of microbenchmarks. These benchmarks show that GASPI provides about the same level of performance for single-threaded communication, but is up to an order of magnitude faster than both Intel and IBM MPI for multi-threaded communication. Hereafter, we discuss the different features of GASPI in comparison to two main PGAS languages, namely UPC and CAF. In addition, we present a basic numerical algorithm, a dense matrix-matrix multiplication, as an example on how an implementation can make efficient use of GASPI's features, especially the asynchronous and one-sided communication mechanisms.","PeriodicalId":153864,"journal":{"name":"2014 IEEE International Parallel & Distributed Processing Symposium Workshops","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125794728","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Hierarchical Pipeline Optimization of Coarse Grained Reconfigurable Processor for Multimedia Applications 多媒体应用中粗粒度可重构处理器的分层管道优化
2014 IEEE International Parallel & Distributed Processing Symposium Workshops Pub Date : 2014-05-19 DOI: 10.1109/IPDPSW.2014.38
Chen Mei, Peng Cao, Yang Zhang, Bo Liu, Leibo Liu
{"title":"Hierarchical Pipeline Optimization of Coarse Grained Reconfigurable Processor for Multimedia Applications","authors":"Chen Mei, Peng Cao, Yang Zhang, Bo Liu, Leibo Liu","doi":"10.1109/IPDPSW.2014.38","DOIUrl":"https://doi.org/10.1109/IPDPSW.2014.38","url":null,"abstract":"Nowadays, driven by the consumer demands, the multimedia market is booming and the video coding standards evolve rapidly. A dynamically coarse grain reconfigurable architecture REMUS-II (REconfigurable MUltimedia System 2) is developed as a multi-standards, high resolution, power efficient, and real-time multimedia decoding processor. The hierarchical pipeline is adopted in the REMUS-II for multimedia applications. This paper details the implementation of pipeline optimization techniques for the algorithm and architecture co-design. In each level, the key factors that influence the pipeline performance are analyzed and optimized, including the computational components, the hierarchical memory interfaces, the synchronization mechanisms, and the balanced task assignments. The experimental results show that, compared to original version, the decoding performance of H.264/AVC is improved 2.93 times by the proposed methods. After optimization, the REMUS-II can decode real-time 1080p streams of multi-standards, including H.264/AVC High Profile, MPEG-2 Main Profile, and AVS Jizhun Profile.","PeriodicalId":153864,"journal":{"name":"2014 IEEE International Parallel & Distributed Processing Symposium Workshops","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122197576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Hybrid Multi-elimination ILU Preconditioners on GPUs gpu上的混合多消除ILU预调节器
D. Lukarski, H. Anzt, S. Tomov, J. Dongarra
{"title":"Hybrid Multi-elimination ILU Preconditioners on GPUs","authors":"D. Lukarski, H. Anzt, S. Tomov, J. Dongarra","doi":"10.1109/IPDPSW.2014.7","DOIUrl":"https://doi.org/10.1109/IPDPSW.2014.7","url":null,"abstract":"Iterative solvers for sparse linear systems often benefit from using preconditioners. While there exist implementations for many iterative methods that leverage the computing power of accelerators, porting the latest developments in preconditioners to accelerators has been challenging. In this paper we develop a selfadaptive multi-elimination preconditioner for graphics processing units (GPUs). The preconditioner is based on a multi-level incomplete LU factorization and uses a direct dense solver for the bottom-level system. For test matrices from the University of Florida matrix collection, we investigate the influence of handling the triangular solvers in the distinct iteration steps in either single or double precision arithmetic. Integrated into a Conjugate Gradient method, we show that our multi-elimination algorithm is highly competitive against popular preconditioners, including multi-colored symmetric Gauss-Seidel relaxation preconditioners, and (multi-colored symmetric) ILU for numerous problems.","PeriodicalId":153864,"journal":{"name":"2014 IEEE International Parallel & Distributed Processing Symposium Workshops","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125440041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信