Language Constructs and Semantics for Runtime-independent Parallelism Expression on Heterogeneous Systems

Shusen Wu, Xiaoshe Dong, Yufei Wang, Weiduo Chen
{"title":"Language Constructs and Semantics for Runtime-independent Parallelism Expression on Heterogeneous Systems","authors":"Shusen Wu, Xiaoshe Dong, Yufei Wang, Weiduo Chen","doi":"10.1109/ICCC47050.2019.9064451","DOIUrl":null,"url":null,"abstract":"The emergence of heterogeneous processors such as GPUs provide massively parallel computing power but also exacerbate the difficulties of parallel programming. Although low-level programming methods such as CUDA and OpenCL can yield good performance, the programming productivity is poor and applications lack portability. In this paper, we present a core language Ruler, which extends C with high-level parallel constructs. These constructs enable programmers to express parallelism in programs without concerning runtime details, thus ease user programming. We present the operational semantics of the language and show how these constructs reserve parallel patterns and parallelism degree of high-level applications. Those information could inform the compiler to generate efficient code and maintain the performance on different platforms. We have implemented a compiler and runtime system for Ruler on the top of OpenCL. Multiple benchmarks are rebuilt with Ruler and evaluated on both a NVIDIA GPU and an Intel MIC platform to demonstrate the effectiveness of our techniques. The size of Ruler code is only 13%-64% to that of the OpenCL code. The rebuilt benchmarks execute smoothly on both platforms after compilation, yielding a competitive performance to that of handcrafted benchmark OpenCL code on both platforms.","PeriodicalId":6739,"journal":{"name":"2019 IEEE 5th International Conference on Computer and Communications (ICCC)","volume":"127 1","pages":"1269-1275"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 5th International Conference on Computer and Communications (ICCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCC47050.2019.9064451","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

The emergence of heterogeneous processors such as GPUs provide massively parallel computing power but also exacerbate the difficulties of parallel programming. Although low-level programming methods such as CUDA and OpenCL can yield good performance, the programming productivity is poor and applications lack portability. In this paper, we present a core language Ruler, which extends C with high-level parallel constructs. These constructs enable programmers to express parallelism in programs without concerning runtime details, thus ease user programming. We present the operational semantics of the language and show how these constructs reserve parallel patterns and parallelism degree of high-level applications. Those information could inform the compiler to generate efficient code and maintain the performance on different platforms. We have implemented a compiler and runtime system for Ruler on the top of OpenCL. Multiple benchmarks are rebuilt with Ruler and evaluated on both a NVIDIA GPU and an Intel MIC platform to demonstrate the effectiveness of our techniques. The size of Ruler code is only 13%-64% to that of the OpenCL code. The rebuilt benchmarks execute smoothly on both platforms after compilation, yielding a competitive performance to that of handcrafted benchmark OpenCL code on both platforms.
异构系统运行无关并行表达的语言结构和语义
gpu等异构处理器的出现提供了大量的并行计算能力,但也加剧了并行编程的困难。虽然CUDA和OpenCL等低级编程方法可以产生良好的性能,但编程效率很差,应用程序缺乏可移植性。在本文中,我们提出了一种核心语言Ruler,它通过高级并行结构扩展了C语言。这些结构使程序员能够在不考虑运行时细节的情况下表达程序中的并行性,从而简化了用户编程。我们介绍了该语言的操作语义,并展示了这些结构如何保留高级应用程序的并行模式和并行度。这些信息可以通知编译器生成高效的代码,并在不同的平台上保持性能。我们在OpenCL的基础上为Ruler实现了一个编译器和运行时系统。使用尺重建多个基准测试,并在NVIDIA GPU和英特尔MIC平台上进行评估,以证明我们技术的有效性。Ruler代码的大小仅为OpenCL代码的13%-64%。编译后重建的基准测试在两个平台上都能顺利执行,在两个平台上产生与手工制作的基准测试OpenCL代码相当的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信