ELMO: A User-Friendly API to Enable Local Memory in OpenCL Kernels

2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing Pub Date : 2013-02-27 DOI:10.1109/pdp.2013.61

Jianbin Fang, A. Varbanescu, Jie Shen, H. Sips

引用次数: 11

Abstract

Recent parallel architectures are equipped with local memory, which simplifies hardware design at the cost of increased program complexity due to explicit management. To simplify this extra-burden that programmers have, we introduce an easy-to-use API, ELMO, that improves productivity while preserving high performance of local memory operations. Specifically, ELMO is a generic API that covers different local memory use-cases. We also present prototype implementations for these APIs and perform multiple GPU-inspired optimizations to maximize their performance. Experimental results on the NVIDIA Quadro5000 GPU show that performance is significantly improved by using ELMO on native implementations: the achieved speedup ranges from 1.3x to 3.7x. Furthermore, using ELMO we still achieve performance comparable (if not better) with that of hand-tuned applications, while the code is shorter, clearer, and safer.

查看原文本刊更多论文

ELMO:在OpenCL内核中启用本地内存的用户友好API

最近的并行体系结构配备了本地内存，这简化了硬件设计，但代价是由于显式管理而增加了程序复杂性。为了简化程序员的额外负担，我们引入了一个易于使用的API ELMO，它可以提高生产力，同时保持本地内存操作的高性能。具体来说，ELMO是一个通用API，涵盖了不同的本地内存用例。我们还提供了这些api的原型实现，并执行了多个gpu启发的优化以最大化其性能。在NVIDIA Quadro5000 GPU上的实验结果表明，在本机实现上使用ELMO可以显著提高性能:实现的加速范围从1.3倍到3.7倍。此外，使用ELMO，我们仍然可以获得与手动调优应用程序相当(如果不是更好的话)的性能，同时代码更短、更清晰、更安全。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing

自引率

0.00%

发文量