Transparent GPU Execution of NumPy Applications

2014 IEEE International Parallel & Distributed Processing Symposium Workshops Pub Date : 2014-05-19 DOI:10.1109/IPDPSW.2014.114

Troels Blum, M. R. B. Kristensen, B. Vinter

引用次数: 8

Abstract

In this work, we present a back-end for the Python library NumPy that utilizes the GPU seamlessly. We use dynamic code generation to generate kernels, and data is moved transparently to and from the GPU. For the integration into NumPy, we use the Bohrium runtime system. Bohrium hooks into NumPy through the implicit data parallelization of array operations, this approach requires no annotations or other code modifications. The key motivation for our GPU computation back-end is to transform high-level Python/NumPy applications to the lowlevel GPU executable kernels, with the goal of obtaining highperformance, high-productivity and high-portability, HP3. We provide a performance study of the GPU back-end that includes four well-known benchmark applications, Black-Scholes, Successive Over-relaxation, Shallow Water, and N-body, implemented in pure Python/NumPy. We demonstrate an impressive 834 times speed up for the Black-Scholes application, and an average speedup of 124 times across the four benchmarks.

查看原文本刊更多论文

透明GPU执行NumPy应用程序

在这项工作中，我们为Python库NumPy提供了一个无缝利用GPU的后端。我们使用动态代码生成来生成内核，数据透明地从GPU移动到GPU。为了集成到NumPy中，我们使用Bohrium运行时系统。Bohrium通过数组操作的隐式数据并行化与NumPy挂钩，这种方法不需要注释或其他代码修改。我们的GPU计算后端的主要动机是将高级Python/NumPy应用程序转换为低级GPU可执行内核，以获得高性能、高生产率和高可移植性的HP3。我们提供了一个GPU后端的性能研究，其中包括四个著名的基准应用程序，Black-Scholes，连续过度松弛，浅水和N-body，在纯Python/NumPy中实现。我们为Black-Scholes应用程序展示了令人印象深刻的834倍的加速，在四个基准测试中平均加速为124倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2014 IEEE International Parallel & Distributed Processing Symposium Workshops

自引率

0.00%

发文量