2011 Symposium on Application Accelerators in High-Performance Computing最新文献

A Study of the Performance of Multifluid PPM Gas Dynamics on CPUs and GPUs 多流体PPM气体动力学在cpu和gpu上的性能研究

2011 Symposium on Application Accelerators in High-Performance Computing Pub Date : 2011-07-19 DOI: 10.1109/SAAHPC.2011.27

Pei-Hung Lin, J. Jayaraj, P. Woodward

{"title":"A Study of the Performance of Multifluid PPM Gas Dynamics on CPUs and GPUs","authors":"Pei-Hung Lin, J. Jayaraj, P. Woodward","doi":"10.1109/SAAHPC.2011.27","DOIUrl":"https://doi.org/10.1109/SAAHPC.2011.27","url":null,"abstract":"The potential for GPUs and many-core CPUs to support high performance computation in the area of computational fluid dynamics (CFD) is explored quantitatively through the example of the PPM gas dynamics code with PPB multi fluid volume fraction advection. This code has already been implemented on the IBM Cell processor and run at full scale on the Los Alamos Roadrunner machine. This implementation has involved a complete restructuring of the code that has been described in detail elsewhere. Here the lessons learned from that work are exploited to take advantage oftoday's latest generations of multi-core CPUs and many-core GPUs. The operations performed by this code are characterized in detail after being first decomposed into a series of individual code kernels to allow an implementation on GPUs. Careful implementations of this code for both CPUs and GPU sare then contrasted from a performance point of view. In addition, a single kernel that has many of the characteristics of the full application on CPUs has been built into a full, standalone, scalable parallel application. This single-kernel application shows the GPU at its best. In contrast, the full multi fluid gas dynamics application brings into play computational requirements that highlight the essential differences in CPU and GPU designs today and the different programming strategies needed to achieve the best performance for applications of this type on the two devices. The single kernel application code performs extremely well on both platforms. This application is not limited by main memory bandwidth on either device instead it is limited only by the computational capability of each. In this case, the GPU has the advantage, because it has more computational cores. The full multi fluid gas dynamics code is, however, of necessity memory bandwidth limited on the GPU, while it is still computational capability limited on the CPU. We believe that these codes provide a useful context for quantifying the costs and benefits of design decisions for these powerful new computing devices. Suggestions for improvements in both devices and codes based upon this work are offered in our conclusions.","PeriodicalId":331604,"journal":{"name":"2011 Symposium on Application Accelerators in High-Performance Computing","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127015300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Real-Time Object Tracking System on FPGAs 基于fpga的实时目标跟踪系统

2011 Symposium on Application Accelerators in High-Performance Computing Pub Date : 2011-07-19 DOI: 10.1109/SAAHPC.2011.22

S. Liu, Alexandros Papakonstantinou, Hongjun Wang, Deming Chen

引用次数: 46

Implications of Memory-Efficiency on Sparse Matrix-Vector Multiplication 稀疏矩阵-向量乘法中内存效率的含义

2011 Symposium on Application Accelerators in High-Performance Computing Pub Date : 2011-07-19 DOI: 10.1109/SAAHPC.2011.24

Shweta Jain, Robin Pottathuparambil, R. Sass

引用次数: 5

Porting Optimized GPU Kernels to a Multi-core CPU: Computational Quantum Chemistry Application Example 将优化的GPU内核移植到多核CPU:计算量子化学应用示例

2011 Symposium on Application Accelerators in High-Performance Computing Pub Date : 2011-07-19 DOI: 10.1109/SAAHPC.2011.8

Dong Ye, Alexey Titov, V. Kindratenko, Ivan S. Ufimtsev, Todd J. Martinez

引用次数: 8

Experience Applying Fortran GPU Compilers to Numerical Weather Prediction 应用Fortran GPU编译器进行数值天气预报的经验

2011 Symposium on Application Accelerators in High-Performance Computing Pub Date : 2011-07-19 DOI: 10.1109/SAAHPC.2011.9

T. Henderson, J. Middlecoff, J. Rosinski, M. Govett, P. Madden

引用次数: 24

A Class of Hybrid LAPACK Algorithms for Multicore and GPU Architectures 一类用于多核和GPU架构的混合LAPACK算法

2011 Symposium on Application Accelerators in High-Performance Computing Pub Date : 2011-07-19 DOI: 10.1109/SAAHPC.2011.18

Mitchel D. Horton, S. Tomov, J. Dongarra

{"title":"A Class of Hybrid LAPACK Algorithms for Multicore and GPU Architectures","authors":"Mitchel D. Horton, S. Tomov, J. Dongarra","doi":"10.1109/SAAHPC.2011.18","DOIUrl":"https://doi.org/10.1109/SAAHPC.2011.18","url":null,"abstract":"Three out of the top four supercomputers in the November 2010 TOP500 list of the world's most powerful supercomputers use NVIDIA GPUs to accelerate computations. Ninety-five systems from the list are using processors with six or more cores. Three-hundred-sixty-five systems use quad-core processor-based systems. Thirty-seven systems are using dual-core processors. The large-scale enabling of hybrid graphics processing unit (GPU)-based multicore platforms for computational science by developing fundamental numerical libraries (in particular, libraries in the area of dense linear algebra) for them has been underway for some time. We present a class of algorithms based largely on software infrastructures that have already been developed for homogeneous multicores and hybrid GPU-based computing. The algorithms extend what is currently available in the Matrix Algebra for GPU and Multicore Architectures (MAGMA) Library for performing Cholesky, QR, and LU factorizations using a single core or socket and a single GPU. The extensions occur in two areas. First, panels factored on the CPU using LAPACK are, instead, done in parallel using a highly optimized dynamic asynchronous scheduled algorithm on some number of CPU cores. Second, the remaining CPU cores are used to update the rightmost panels of the matrix in parallel.","PeriodicalId":331604,"journal":{"name":"2011 Symposium on Application Accelerators in High-Performance Computing","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115816383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 39

A First Analysis of a Dynamic Memory Allocation Controller (DMAC) Core 动态内存分配控制器(DMAC)核心初探

2011 Symposium on Application Accelerators in High-Performance Computing Pub Date : 2011-07-19 DOI: 10.1109/SAAHPC.2011.23

Y. Rajasekhar, R. Sass

引用次数: 2

Python for Development of OpenMP and CUDA Kernels for Multidimensional Data 用于多维数据的OpenMP和CUDA内核开发的Python

2011 Symposium on Application Accelerators in High-Performance Computing Pub Date : 2011-07-19 DOI: 10.1109/SAAHPC.2011.26

B. Vacaliuc, D. Patlolla, E. D'Azevedo, G. Davidson, John K. Munro Jr, T. Evans, W. Joubert, Z. Bell

{"title":"Python for Development of OpenMP and CUDA Kernels for Multidimensional Data","authors":"B. Vacaliuc, D. Patlolla, E. D'Azevedo, G. Davidson, John K. Munro Jr, T. Evans, W. Joubert, Z. Bell","doi":"10.1109/SAAHPC.2011.26","DOIUrl":"https://doi.org/10.1109/SAAHPC.2011.26","url":null,"abstract":"Design of data structures for high performance computing (HPC) is one of the principal challenges facing researchers looking to utilize heterogeneous computing machinery. Heterogeneous systems derive cost, power, and speed efficiency by being composed of the appropriate hardware for the task. Yet, each type of processor requires a specific organization of the application state in order to achieve peak performance. Discovering this and refactoring the code can be a challenging and time-consuming task for the researcher, as the data structures and the computational model must be co-designed. We present a methodology that uses Python as the environment for which to explore tradeoffs in both the data structure design as well as the code executing on the computation accelerator. Our method enables multi-dimensional arrays to be used effectively in any target environment. We have chosen to focus on OpenMP and CUDA environments, thus exploring the development of optimized kernels for the two most common classes of computing hardware available today: multi-core CPU and GPU. Python's large palette of file and network access routines, its associative indexing syntax and support for common HPC environments makes it relevant for diverse hardware ranging from laptops through computing clusters to the highest performance supercomputers. Our work enables researchers to accelerate the development of their codes on the computing hardware of their choice.","PeriodicalId":331604,"journal":{"name":"2011 Symposium on Application Accelerators in High-Performance Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130618018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Non-serial Polyadic Dynamic Programming on a Data-Parallel Many-core Architecture 数据并行多核体系结构上的非串行多进动态规划

2011 Symposium on Application Accelerators in High-Performance Computing Pub Date : 2011-07-19 DOI: 10.1109/SAAHPC.2011.25

M. Moazeni, M. Sarrafzadeh, A. Bui

{"title":"Non-serial Polyadic Dynamic Programming on a Data-Parallel Many-core Architecture","authors":"M. Moazeni, M. Sarrafzadeh, A. Bui","doi":"10.1109/SAAHPC.2011.25","DOIUrl":"https://doi.org/10.1109/SAAHPC.2011.25","url":null,"abstract":"Dynamic Programming (DP) is a method for efficiently solving a broad range of search and optimization problems. As a result, techniques for managing large-scale DP problems are often critical to the performance of many applications. DP algorithms are often hard to parallelize. In this paper, we address the challenge of exploiting fine grain parallelism on a family of DP algorithms known as non-serial polyadic. We use an abstract formulation of non-serial polyadic DP, derived from RNA secondary structure prediction and matrix parenthesization approaches that are well-known and important problems from this family. We present a load balancing algorithm that achieves the best overall performance with this type of workload on many-core architectures. A divide-and-conquer approach previously used on multi-core architectures is compared against an iterative version. To evaluate these approaches, the algorithm was implemented on three NVIDIA GPUs using CUDA. We achieved up to 10 GFLOP/s performance and up to 228x speedup over the single-threaded CPU implementation. Moreover, the iterative approach results in up to 3.92x speedup over the divide-and-conquer approach.","PeriodicalId":331604,"journal":{"name":"2011 Symposium on Application Accelerators in High-Performance Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130167493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

GPU Performance Comparison for Accelerated Radar Data Processing 加速雷达数据处理的GPU性能比较

2011 Symposium on Application Accelerators in High-Performance Computing Pub Date : 2011-07-19 DOI: 10.1109/SAAHPC.2011.14

C. Fallen, B.V.C. Bellamy, G. Newby, B. Watkins

{"title":"GPU Performance Comparison for Accelerated Radar Data Processing","authors":"C. Fallen, B.V.C. Bellamy, G. Newby, B. Watkins","doi":"10.1109/SAAHPC.2011.14","DOIUrl":"https://doi.org/10.1109/SAAHPC.2011.14","url":null,"abstract":"Radar is a data-intensive measurement technique often requiring significant processing to make full use of the received signal. However, computing capacity is limited at remote or mobile radar installations thereby limiting radar data products used for real-time decisions. We used graphics processing units (GPUs) to accelerate processing of high resolution phase-coded radar data from the Modular UHF Ionosphere Radar (MUIR) at the High-frequency Active Auroral Research Program (HAARP) facility in Gakona, Alaska. Previously, this data could not be processed on-site in sufficient time to be useful for decisions made during active experiment campaigns, nor could the data be uploaded for off-site processing to high-performance computing (HPC) resources at the Arctic Region Supercomputing Center (ARSC) in Fairbanks. In this paper, we present a radar data-processing performance comparison of a workstation equipped with dual NVIDIA GeForce GTX 480 GPU accelerator cards and a node from ARSC's PACMAN cluster equipped with dual NVIDIA Tesla M2050 cards. Both platforms meet performance requirements, are relatively inexpensive and could operate effectively at remote observatories such as HAARP.","PeriodicalId":331604,"journal":{"name":"2011 Symposium on Application Accelerators in High-Performance Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129045473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10