数据并行程序中复制注释的自动插入

2016 28th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD) Pub Date : 2016-10-01 DOI:10.1109/SBAC-PAD.2016.13

G. Mendonca, B. Guimarães, P. Alves, Fernando Magno Quintão Pereira, M. Pereira, G. Araújo

{"title":"数据并行程序中复制注释的自动插入","authors":"G. Mendonca, B. Guimarães, P. Alves, Fernando Magno Quintão Pereira, M. Pereira, G. Araújo","doi":"10.1109/SBAC-PAD.2016.13","DOIUrl":null,"url":null,"abstract":"Directive-based programming models, such as OpenACC and OpenMP arise today as promising techniques to support the development of parallel applications. These systems allow developers to convert a sequential program into a parallel one with minimum human intervention. However, inserting pragmas into production code is a difficult and error-prone task, often requiring familiarity with the target program. This difficulty restricts the ability of developers to annotate code that they have not written themselves. This paper provides one fundamental component in the solution of this problem. We introduce a static program analysis that infers the bounds of memory regions referenced in source code. Such bounds allow us to automatically insert data-transfer primitives, which are needed when the parallelized code is meant to be executed in an accelerator device, such as a GPU. To validate our ideas, we have applied them onto Polybench, using two different architectures: Nvidia and Qualcomm-based. We have successfully analyzed 98% of all the memory accesses in Polybench. This result has enabled us to insert automatic annotations into those benchmarks leading to speedups of over 100x.","PeriodicalId":361160,"journal":{"name":"2016 28th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":"{\"title\":\"Automatic Insertion of Copy Annotation in Data-Parallel Programs\",\"authors\":\"G. Mendonca, B. Guimarães, P. Alves, Fernando Magno Quintão Pereira, M. Pereira, G. Araújo\",\"doi\":\"10.1109/SBAC-PAD.2016.13\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Directive-based programming models, such as OpenACC and OpenMP arise today as promising techniques to support the development of parallel applications. These systems allow developers to convert a sequential program into a parallel one with minimum human intervention. However, inserting pragmas into production code is a difficult and error-prone task, often requiring familiarity with the target program. This difficulty restricts the ability of developers to annotate code that they have not written themselves. This paper provides one fundamental component in the solution of this problem. We introduce a static program analysis that infers the bounds of memory regions referenced in source code. Such bounds allow us to automatically insert data-transfer primitives, which are needed when the parallelized code is meant to be executed in an accelerator device, such as a GPU. To validate our ideas, we have applied them onto Polybench, using two different architectures: Nvidia and Qualcomm-based. We have successfully analyzed 98% of all the memory accesses in Polybench. This result has enabled us to insert automatic annotations into those benchmarks leading to speedups of over 100x.\",\"PeriodicalId\":361160,\"journal\":{\"name\":\"2016 28th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)\",\"volume\":\"37 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"15\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 28th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SBAC-PAD.2016.13\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 28th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SBAC-PAD.2016.13","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 15

摘要

基于指令的编程模型，如OpenACC和OpenMP，作为支持并行应用程序开发的有前途的技术而出现。这些系统允许开发人员以最少的人为干预将顺序程序转换为并行程序。然而，在生产代码中插入pragma是一项困难且容易出错的任务，通常需要熟悉目标程序。这种困难限制了开发人员注释不是他们自己编写的代码的能力。本文提供了解决这一问题的一个基本组成部分。我们介绍了一个静态程序分析来推断源代码中引用的内存区域的边界。这样的边界允许我们自动插入数据传输原语，当并行代码打算在加速器设备(如GPU)中执行时，这是需要的。为了验证我们的想法，我们将它们应用到Polybench上，使用两种不同的架构:基于Nvidia和高通的架构。我们已经成功地分析了Polybench中98%的内存访问。这个结果使我们能够在这些基准测试中插入自动注释，从而使速度提高100倍以上。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Automatic Insertion of Copy Annotation in Data-Parallel Programs

Directive-based programming models, such as OpenACC and OpenMP arise today as promising techniques to support the development of parallel applications. These systems allow developers to convert a sequential program into a parallel one with minimum human intervention. However, inserting pragmas into production code is a difficult and error-prone task, often requiring familiarity with the target program. This difficulty restricts the ability of developers to annotate code that they have not written themselves. This paper provides one fundamental component in the solution of this problem. We introduce a static program analysis that infers the bounds of memory regions referenced in source code. Such bounds allow us to automatically insert data-transfer primitives, which are needed when the parallelized code is meant to be executed in an accelerator device, such as a GPU. To validate our ideas, we have applied them onto Polybench, using two different architectures: Nvidia and Qualcomm-based. We have successfully analyzed 98% of all the memory accesses in Polybench. This result has enabled us to insert automatic annotations into those benchmarks leading to speedups of over 100x.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2016 28th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)

自引率

0.00%

发文量