数据布局转换的OpenACC扩展

2014 First Workshop on Accelerator Programming using Directives Pub Date : 2014-11-16 DOI:10.1109/WACCPD.2014.12

Tetsuya Hoshino, N. Maruyama, S. Matsuoka

{"title":"数据布局转换的OpenACC扩展","authors":"Tetsuya Hoshino, N. Maruyama, S. Matsuoka","doi":"10.1109/WACCPD.2014.12","DOIUrl":null,"url":null,"abstract":"OpenACC is gaining momentum as an implicit and portable interface in porting legacy CPU-based applications to heterogeneous, highly parallel computational environment involving many-core accelerators such as GPUs and Intel Xeon Phi. OpenACC provides a set of loop directives similar to OpenMP for the parallelization and also to manage data movement, attaining functional portability across different heterogeneous devices; however, the performance portability of OpenACC is said to be insufficient due to the characteristics of different target devices, especially those regarding memory layouts, as automated attempts by the compilers to adapt is currently difficult. We are currently working to propose a set of directives to allow compilers to have better semantic information for adaptation; here, we particularly focus on data layout such as Structure of Arrays, advantageous data structure for GPUs, as opposed to Array of Structures, which exhibits good performance on CPUs. We propose a directive extension to OpenACC that allows the users to flexibility specify optimal layouts, even if the data structures are nested. Performance results show that we gain as much as 96 % in performance for CPUs and 165% for GPUs compared to programs without such directives, essentially attaining both functional and performance portability in OpenACC.","PeriodicalId":179664,"journal":{"name":"2014 First Workshop on Accelerator Programming using Directives","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":"{\"title\":\"An OpenACC Extension for Data Layout Transformation\",\"authors\":\"Tetsuya Hoshino, N. Maruyama, S. Matsuoka\",\"doi\":\"10.1109/WACCPD.2014.12\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"OpenACC is gaining momentum as an implicit and portable interface in porting legacy CPU-based applications to heterogeneous, highly parallel computational environment involving many-core accelerators such as GPUs and Intel Xeon Phi. OpenACC provides a set of loop directives similar to OpenMP for the parallelization and also to manage data movement, attaining functional portability across different heterogeneous devices; however, the performance portability of OpenACC is said to be insufficient due to the characteristics of different target devices, especially those regarding memory layouts, as automated attempts by the compilers to adapt is currently difficult. We are currently working to propose a set of directives to allow compilers to have better semantic information for adaptation; here, we particularly focus on data layout such as Structure of Arrays, advantageous data structure for GPUs, as opposed to Array of Structures, which exhibits good performance on CPUs. We propose a directive extension to OpenACC that allows the users to flexibility specify optimal layouts, even if the data structures are nested. Performance results show that we gain as much as 96 % in performance for CPUs and 165% for GPUs compared to programs without such directives, essentially attaining both functional and performance portability in OpenACC.\",\"PeriodicalId\":179664,\"journal\":{\"name\":\"2014 First Workshop on Accelerator Programming using Directives\",\"volume\":\"2 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-11-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"13\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 First Workshop on Accelerator Programming using Directives\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/WACCPD.2014.12\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 First Workshop on Accelerator Programming using Directives","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WACCPD.2014.12","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 13

摘要

OpenACC作为一种隐式的可移植接口，正在将传统的基于cpu的应用程序移植到异构的、高度并行的计算环境中，涉及到多核加速器(如gpu和Intel Xeon Phi)。OpenACC提供了一组类似于OpenMP的循环指令，用于并行化和管理数据移动，实现跨不同异构设备的功能可移植性;然而，由于不同目标设备的特性，特别是那些关于内存布局的特性，OpenACC的性能可移植性据说是不够的，因为编译器自动尝试适应目前是困难的。我们目前正在努力提出一套指令，以允许编译器有更好的语义信息来适应;在这里，我们特别关注数据布局，如结构数组，对gpu有利的数据结构，而不是数组结构，在cpu上表现出良好的性能。我们建议对OpenACC进行指令扩展，允许用户灵活地指定最佳布局，即使数据结构是嵌套的。性能结果表明，与没有这些指令的程序相比，我们的cpu性能提高了96%，gpu性能提高了165%，基本上在OpenACC中实现了功能和性能的可移植性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

An OpenACC Extension for Data Layout Transformation

OpenACC is gaining momentum as an implicit and portable interface in porting legacy CPU-based applications to heterogeneous, highly parallel computational environment involving many-core accelerators such as GPUs and Intel Xeon Phi. OpenACC provides a set of loop directives similar to OpenMP for the parallelization and also to manage data movement, attaining functional portability across different heterogeneous devices; however, the performance portability of OpenACC is said to be insufficient due to the characteristics of different target devices, especially those regarding memory layouts, as automated attempts by the compilers to adapt is currently difficult. We are currently working to propose a set of directives to allow compilers to have better semantic information for adaptation; here, we particularly focus on data layout such as Structure of Arrays, advantageous data structure for GPUs, as opposed to Array of Structures, which exhibits good performance on CPUs. We propose a directive extension to OpenACC that allows the users to flexibility specify optimal layouts, even if the data structures are nested. Performance results show that we gain as much as 96 % in performance for CPUs and 165% for GPUs compared to programs without such directives, essentially attaining both functional and performance portability in OpenACC.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2014 First Workshop on Accelerator Programming using Directives

自引率

0.00%

发文量