Implementation of XcalableMP Device Acceleration Extention with OpenCL

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum Pub Date : 2012-05-21 DOI:10.1109/IPDPSW.2012.296

Takuma Nomizu, D. Takahashi, Jinpil Lee, T. Boku, M. Sato

{"title":"Implementation of XcalableMP Device Acceleration Extention with OpenCL","authors":"Takuma Nomizu, D. Takahashi, Jinpil Lee, T. Boku, M. Sato","doi":"10.1109/IPDPSW.2012.296","DOIUrl":null,"url":null,"abstract":"Due to their outstanding computational performance, many acceleration devices, such as GPUs, the Cell Broadband Engine (Cell/B.E.), and multi-core computing are attracting a lot of attention in the field of high-performance computing. Although there are many programming models and languages de-signed for programming accelerators, such as CUDA, AMD Accelerated Parallel Processing (AMD APP), and OpenCL, these models remain difficult and complex. Furthermore, when programming for accelerator-enhanced clusters, we have to use an inter-node programming interface, such as MPI to coordinate the nodes. In order to address these problems and reduce complexity, an extension to XcalableMP (XMP), a PGAS language, for use on accelerator-enhanced clusters, called XcalableMP Device Acceleration Extension (XMP-dev), is proposed. In XMP-dev, a global distributed data is mapped onto distributed memory of each accelerator, and a fragment of codes can be of-floaded to execute in a set of accelerators. It eliminates the complex programming between nodes and accelerators and between nodes. In this paper, we present an implementation of the XMP-dev runtime library with the OpenCL APIs, while the previous implementation targets CUDA-only. Since OpenCL is a standardized interface supported for various kinds of accelerators, it improves the portability of XMP-dev and reduces the cost of development. In the result of performance evaluation, we show that the OpenCL implementation of XMP-dev can generate portable programs that can run on not only NVIDIA GPU-enhanced clusters but also various accelerator-enhanced clusters.","PeriodicalId":378335,"journal":{"name":"2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPSW.2012.296","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

Abstract

Due to their outstanding computational performance, many acceleration devices, such as GPUs, the Cell Broadband Engine (Cell/B.E.), and multi-core computing are attracting a lot of attention in the field of high-performance computing. Although there are many programming models and languages de-signed for programming accelerators, such as CUDA, AMD Accelerated Parallel Processing (AMD APP), and OpenCL, these models remain difficult and complex. Furthermore, when programming for accelerator-enhanced clusters, we have to use an inter-node programming interface, such as MPI to coordinate the nodes. In order to address these problems and reduce complexity, an extension to XcalableMP (XMP), a PGAS language, for use on accelerator-enhanced clusters, called XcalableMP Device Acceleration Extension (XMP-dev), is proposed. In XMP-dev, a global distributed data is mapped onto distributed memory of each accelerator, and a fragment of codes can be of-floaded to execute in a set of accelerators. It eliminates the complex programming between nodes and accelerators and between nodes. In this paper, we present an implementation of the XMP-dev runtime library with the OpenCL APIs, while the previous implementation targets CUDA-only. Since OpenCL is a standardized interface supported for various kinds of accelerators, it improves the portability of XMP-dev and reduces the cost of development. In the result of performance evaluation, we show that the OpenCL implementation of XMP-dev can generate portable programs that can run on not only NVIDIA GPU-enhanced clusters but also various accelerator-enhanced clusters.

查看原文本刊更多论文

用OpenCL实现XcalableMP设备加速扩展

gpu、Cell宽带引擎(Cell/B.E.)、多核计算等加速设备由于其出色的计算性能，在高性能计算领域备受关注。虽然有许多编程模型和语言设计的编程加速器，如CUDA, AMD加速并行处理(AMD APP)，和OpenCL，这些模型仍然困难和复杂。此外，在为加速器增强的集群编程时，我们必须使用节点间编程接口，例如MPI来协调节点。为了解决这些问题并降低复杂性，提出了一种扩展XcalableMP (XMP)，一种PGAS语言，用于加速器增强的集群，称为XcalableMP设备加速扩展(XMP-dev)。在XMP-dev中，将全局分布式数据映射到每个加速器的分布式内存中，并且可以加载代码片段以在一组加速器中执行。它消除了节点和加速器之间以及节点之间的复杂编程。在本文中，我们提出了一个使用OpenCL api的XMP-dev运行时库的实现，而之前的实现仅针对cuda。由于OpenCL是支持各种加速器的标准化接口，因此它提高了XMP-dev的可移植性并降低了开发成本。在性能评估结果中，我们表明XMP-dev的OpenCL实现可以生成可移植的程序，这些程序不仅可以在NVIDIA gpu增强的集群上运行，还可以在各种加速器增强的集群上运行。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum

自引率

0.00%

发文量