The Challenges of Writing Portable, Correct and High Performance Libraries for GPUs.

Computer architecture news Pub Date : 2011-09-01 DOI:10.1145/2082156.2082158

Miriam Leeser, Devon Yablonski, Dana Brooks, Laurie Smith King

{"title":"The Challenges of Writing Portable, Correct and High Performance Libraries for GPUs.","authors":"Miriam Leeser, Devon Yablonski, Dana Brooks, Laurie Smith King","doi":"10.1145/2082156.2082158","DOIUrl":null,"url":null,"abstract":"Graphics Processing Units (GPUs) are widely used to accelerate scientific applications. Many successes have been reported with speedups of two or three orders of magnitude over serial implementations of the same algorithms. These speedups typically pertain to a specific implementation with fixed parameters mapped to a specific hardware implementation. The implementations are not designed to be easily ported to other GPUs, even from the same manufacturer. When target hardware changes, the application must be re-optimized. In this paper we address a different problem. We aim to deliver working, efficient GPU code in a library that is downloaded and run by many different users. The issue is to deliver efficiency independent of the individual user parameters and without a priori knowledge of the hardware the user will employ. This problem requires a different set of tradeoffs than finding the best runtime for a single solution. Solutions must be adaptable to a range of different parameters both to solve users' problems and to make the best use of the target hardware. Another issue is the integration of GPUs into a Problem Solving Environment (PSE) where the use of a GPU is almost invisible from the perspective of the user. Ease of use and smooth interactions with the existing user interface are important to our approach. We illustrate our solution with the incorporation of GPU processing into the Scientific Computing Institute (SCI)Run Biomedical PSE developed at the University of Utah. SCIRun allows scientists to interactively construct many different types of biomedical simulations. We use this environment to demonstrate the effectiveness of the GPU by accelerating time consuming algorithms in the scientist's simulations. Specifically we target the linear solver module, including Conjugate Gradient, Jacobi and MinRes solvers for sparse matrices.","PeriodicalId":89753,"journal":{"name":"Computer architecture news","volume":"39 4","pages":"2-7"},"PeriodicalIF":0.0000,"publicationDate":"2011-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3691863/pdf/nihms369666.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer architecture news","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2082156.2082158","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Graphics Processing Units (GPUs) are widely used to accelerate scientific applications. Many successes have been reported with speedups of two or three orders of magnitude over serial implementations of the same algorithms. These speedups typically pertain to a specific implementation with fixed parameters mapped to a specific hardware implementation. The implementations are not designed to be easily ported to other GPUs, even from the same manufacturer. When target hardware changes, the application must be re-optimized. In this paper we address a different problem. We aim to deliver working, efficient GPU code in a library that is downloaded and run by many different users. The issue is to deliver efficiency independent of the individual user parameters and without a priori knowledge of the hardware the user will employ. This problem requires a different set of tradeoffs than finding the best runtime for a single solution. Solutions must be adaptable to a range of different parameters both to solve users' problems and to make the best use of the target hardware. Another issue is the integration of GPUs into a Problem Solving Environment (PSE) where the use of a GPU is almost invisible from the perspective of the user. Ease of use and smooth interactions with the existing user interface are important to our approach. We illustrate our solution with the incorporation of GPU processing into the Scientific Computing Institute (SCI)Run Biomedical PSE developed at the University of Utah. SCIRun allows scientists to interactively construct many different types of biomedical simulations. We use this environment to demonstrate the effectiveness of the GPU by accelerating time consuming algorithms in the scientist's simulations. Specifically we target the linear solver module, including Conjugate Gradient, Jacobi and MinRes solvers for sparse matrices.

Abstract Image

查看原文本刊更多论文

为gpu编写可移植、正确和高性能库的挑战。

图形处理器（Graphics Processing unit, gpu）被广泛用于加速科学应用。据报道，许多成功的速度比同一算法的串行实现提高了两到三个数量级。这些加速通常与特定的实现有关，这些实现将固定的参数映射到特定的硬件实现。这些实现的设计并不能很容易地移植到其他gpu上，即使是来自同一制造商。当目标硬件发生变化时，必须重新优化应用程序。在本文中，我们要解决一个不同的问题。我们的目标是在一个库中提供工作，高效的GPU代码，可供许多不同的用户下载和运行。问题是提供独立于单个用户参数的效率，并且没有用户将使用的硬件的先验知识。这个问题需要一组不同的权衡，而不是为单个解决方案找到最佳运行时。解决方案必须适应一系列不同的参数，既能解决用户的问题，又能充分利用目标硬件。另一个问题是将GPU集成到问题解决环境（Problem Solving Environment， PSE）中，从用户的角度来看，GPU的使用几乎是不可见的。易用性和与现有用户界面的流畅交互对我们的方法非常重要。我们通过将GPU处理整合到犹他大学开发的科学计算研究所（SCI）Run Biomedical PSE中来说明我们的解决方案。SCIRun允许科学家交互式地构建许多不同类型的生物医学模拟。我们使用这个环境来证明GPU的有效性，通过加速耗时算法在科学家的模拟。具体来说，我们的目标是线性求解器模块，包括稀疏矩阵的共轭梯度，Jacobi和MinRes求解器。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computer architecture news

自引率

0.00%

发文量