A distributed OpenCL framework using redundant computation and data replication

Junghyun Kim, Gangwon Jo, Jaehoon Jung, Jungwon Kim, Jaejin Lee
{"title":"A distributed OpenCL framework using redundant computation and data replication","authors":"Junghyun Kim, Gangwon Jo, Jaehoon Jung, Jungwon Kim, Jaejin Lee","doi":"10.1145/2908080.2908094","DOIUrl":null,"url":null,"abstract":"Applications written solely in OpenCL or CUDA cannot execute on a cluster as a whole. Most previous approaches that extend these programming models to clusters are based on a common idea: designating a centralized host node and coordinating the other nodes with the host for computation. However, the centralized host node is a serious performance bottleneck when the number of nodes is large. In this paper, we propose a scalable and distributed OpenCL framework called SnuCL-D for large-scale clusters. SnuCL-D's remote device virtualization provides an OpenCL application with an illusion that all compute devices in a cluster are confined in a single node. To reduce the amount of control-message and data communication between nodes, SnuCL-D replicates the OpenCL host program execution and data in each node. We also propose a new OpenCL host API function and a queueing optimization technique that significantly reduce the overhead incurred by the previous centralized approaches. To show the effectiveness of SnuCL-D, we evaluate SnuCL-D with a microbenchmark and eleven benchmark applications on a large-scale CPU cluster and a medium-scale GPU cluster.","PeriodicalId":178839,"journal":{"name":"Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"52 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2908080.2908094","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 17

Abstract

Applications written solely in OpenCL or CUDA cannot execute on a cluster as a whole. Most previous approaches that extend these programming models to clusters are based on a common idea: designating a centralized host node and coordinating the other nodes with the host for computation. However, the centralized host node is a serious performance bottleneck when the number of nodes is large. In this paper, we propose a scalable and distributed OpenCL framework called SnuCL-D for large-scale clusters. SnuCL-D's remote device virtualization provides an OpenCL application with an illusion that all compute devices in a cluster are confined in a single node. To reduce the amount of control-message and data communication between nodes, SnuCL-D replicates the OpenCL host program execution and data in each node. We also propose a new OpenCL host API function and a queueing optimization technique that significantly reduce the overhead incurred by the previous centralized approaches. To show the effectiveness of SnuCL-D, we evaluate SnuCL-D with a microbenchmark and eleven benchmark applications on a large-scale CPU cluster and a medium-scale GPU cluster.
一个使用冗余计算和数据复制的分布式OpenCL框架
仅用OpenCL或CUDA编写的应用程序不能在整个集群上执行。以前将这些编程模型扩展到集群的大多数方法都基于一个共同的思想:指定一个集中的主机节点,并与主机协调其他节点进行计算。但是,当节点数量较大时,集中式主机节点是一个严重的性能瓶颈。在本文中,我们提出了一个可扩展的分布式OpenCL框架,称为SnuCL-D,用于大规模集群。SnuCL-D的远程设备虚拟化为OpenCL应用程序提供了一种错觉,即集群中的所有计算设备都被限制在单个节点中。为了减少节点之间控制消息和数据通信的数量,SnuCL-D在每个节点中复制OpenCL主机程序的执行和数据。我们还提出了一个新的OpenCL主机API函数和队列优化技术,可以显著减少以前集中式方法带来的开销。为了证明SnuCL-D的有效性,我们在大规模CPU集群和中等规模GPU集群上使用微基准测试和11个基准测试应用程序来评估SnuCL-D。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信