G. Mendonca, B. Guimarães, P. Alves, Fernando Magno Quintão Pereira, M. Pereira, G. Araújo
{"title":"数据并行程序中复制注释的自动插入","authors":"G. Mendonca, B. Guimarães, P. Alves, Fernando Magno Quintão Pereira, M. Pereira, G. Araújo","doi":"10.1109/SBAC-PAD.2016.13","DOIUrl":null,"url":null,"abstract":"Directive-based programming models, such as OpenACC and OpenMP arise today as promising techniques to support the development of parallel applications. These systems allow developers to convert a sequential program into a parallel one with minimum human intervention. However, inserting pragmas into production code is a difficult and error-prone task, often requiring familiarity with the target program. This difficulty restricts the ability of developers to annotate code that they have not written themselves. This paper provides one fundamental component in the solution of this problem. We introduce a static program analysis that infers the bounds of memory regions referenced in source code. Such bounds allow us to automatically insert data-transfer primitives, which are needed when the parallelized code is meant to be executed in an accelerator device, such as a GPU. To validate our ideas, we have applied them onto Polybench, using two different architectures: Nvidia and Qualcomm-based. We have successfully analyzed 98% of all the memory accesses in Polybench. This result has enabled us to insert automatic annotations into those benchmarks leading to speedups of over 100x.","PeriodicalId":361160,"journal":{"name":"2016 28th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":"{\"title\":\"Automatic Insertion of Copy Annotation in Data-Parallel Programs\",\"authors\":\"G. Mendonca, B. Guimarães, P. Alves, Fernando Magno Quintão Pereira, M. Pereira, G. Araújo\",\"doi\":\"10.1109/SBAC-PAD.2016.13\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Directive-based programming models, such as OpenACC and OpenMP arise today as promising techniques to support the development of parallel applications. These systems allow developers to convert a sequential program into a parallel one with minimum human intervention. However, inserting pragmas into production code is a difficult and error-prone task, often requiring familiarity with the target program. This difficulty restricts the ability of developers to annotate code that they have not written themselves. This paper provides one fundamental component in the solution of this problem. We introduce a static program analysis that infers the bounds of memory regions referenced in source code. Such bounds allow us to automatically insert data-transfer primitives, which are needed when the parallelized code is meant to be executed in an accelerator device, such as a GPU. To validate our ideas, we have applied them onto Polybench, using two different architectures: Nvidia and Qualcomm-based. We have successfully analyzed 98% of all the memory accesses in Polybench. This result has enabled us to insert automatic annotations into those benchmarks leading to speedups of over 100x.\",\"PeriodicalId\":361160,\"journal\":{\"name\":\"2016 28th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)\",\"volume\":\"37 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"15\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 28th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SBAC-PAD.2016.13\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 28th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SBAC-PAD.2016.13","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Automatic Insertion of Copy Annotation in Data-Parallel Programs
Directive-based programming models, such as OpenACC and OpenMP arise today as promising techniques to support the development of parallel applications. These systems allow developers to convert a sequential program into a parallel one with minimum human intervention. However, inserting pragmas into production code is a difficult and error-prone task, often requiring familiarity with the target program. This difficulty restricts the ability of developers to annotate code that they have not written themselves. This paper provides one fundamental component in the solution of this problem. We introduce a static program analysis that infers the bounds of memory regions referenced in source code. Such bounds allow us to automatically insert data-transfer primitives, which are needed when the parallelized code is meant to be executed in an accelerator device, such as a GPU. To validate our ideas, we have applied them onto Polybench, using two different architectures: Nvidia and Qualcomm-based. We have successfully analyzed 98% of all the memory accesses in Polybench. This result has enabled us to insert automatic annotations into those benchmarks leading to speedups of over 100x.