基于指令的贴图抽象，在加速器上分配循环

Proceedings of the General Purpose GPUs Pub Date : 2017-02-04 DOI:10.1145/3038228.3038238

T. Vanderbruggen, John Cavazos, C. Liao, D. Quinlan

{"title":"基于指令的贴图抽象，在加速器上分配循环","authors":"T. Vanderbruggen, John Cavazos, C. Liao, D. Quinlan","doi":"10.1145/3038228.3038238","DOIUrl":null,"url":null,"abstract":"Optimizing applications for the next generation of super-computers requires next generation compilers. These compilers need to provide an abstraction for the developer to describe the inner working of applications. And, next generation compilers need to be able to intelligently apply optimizations to a wide variety of algorithms solved by scientific applications. They need to optimize applications for any workload targeting any architecture. In this paper, we present an important component of any next generation supercomputer compiler that we call TileK. TileK is a tile abstraction used to generate distributed kernels from nested loops. It provides a high-level abstraction used to decompose the iteration space of loop nests. Its directives-based language enables an effective and efficient placement of multi-dimensional computations on the 3D topology of accelerators (e.g. graphics processing units, GPUs). We implemented both the tile abstraction and the kernel generator in ROSE Compiler. We used TileK to parallelize linear algebra kernels and stencils, targeting multicore CPUs (pThread) and GPUs (OpenCL). TileK enabled us to explore and evaluate a large optimization space of many versions of these kernels for varying input sizes. Our results shows that the selection of a given optimization for a specific input size is a challenging problem.","PeriodicalId":108772,"journal":{"name":"Proceedings of the General Purpose GPUs","volume":"98 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Directive-based tile abstraction to distribute loops on accelerators\",\"authors\":\"T. Vanderbruggen, John Cavazos, C. Liao, D. Quinlan\",\"doi\":\"10.1145/3038228.3038238\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Optimizing applications for the next generation of super-computers requires next generation compilers. These compilers need to provide an abstraction for the developer to describe the inner working of applications. And, next generation compilers need to be able to intelligently apply optimizations to a wide variety of algorithms solved by scientific applications. They need to optimize applications for any workload targeting any architecture. In this paper, we present an important component of any next generation supercomputer compiler that we call TileK. TileK is a tile abstraction used to generate distributed kernels from nested loops. It provides a high-level abstraction used to decompose the iteration space of loop nests. Its directives-based language enables an effective and efficient placement of multi-dimensional computations on the 3D topology of accelerators (e.g. graphics processing units, GPUs). We implemented both the tile abstraction and the kernel generator in ROSE Compiler. We used TileK to parallelize linear algebra kernels and stencils, targeting multicore CPUs (pThread) and GPUs (OpenCL). TileK enabled us to explore and evaluate a large optimization space of many versions of these kernels for varying input sizes. Our results shows that the selection of a given optimization for a specific input size is a challenging problem.\",\"PeriodicalId\":108772,\"journal\":{\"name\":\"Proceedings of the General Purpose GPUs\",\"volume\":\"98 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-02-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the General Purpose GPUs\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3038228.3038238\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the General Purpose GPUs","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3038228.3038238","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

为下一代超级计算机优化应用程序需要下一代编译器。这些编译器需要为开发人员提供一个抽象来描述应用程序的内部工作。而且，下一代编译器需要能够智能地将优化应用于科学应用程序解决的各种算法。他们需要针对针对任何架构的任何工作负载优化应用程序。在本文中，我们提出了下一代超级计算机编译器的一个重要组成部分，我们称之为TileK。TileK是一个tile抽象，用于从嵌套循环生成分布式内核。它提供了一个高级抽象，用于分解循环巢的迭代空间。它的基于指令的语言使得在加速器(例如图形处理单元，gpu)的3D拓扑上有效和高效地放置多维计算。我们在ROSE Compiler中实现了tile抽象和内核生成器。我们使用TileK来并行化线性代数内核和模板，目标是多核cpu (pThread)和gpu (OpenCL)。TileK使我们能够为不同的输入大小探索和评估这些内核的许多版本的大型优化空间。我们的结果表明，对于特定的输入大小选择给定的优化是一个具有挑战性的问题。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Directive-based tile abstraction to distribute loops on accelerators

Optimizing applications for the next generation of super-computers requires next generation compilers. These compilers need to provide an abstraction for the developer to describe the inner working of applications. And, next generation compilers need to be able to intelligently apply optimizations to a wide variety of algorithms solved by scientific applications. They need to optimize applications for any workload targeting any architecture. In this paper, we present an important component of any next generation supercomputer compiler that we call TileK. TileK is a tile abstraction used to generate distributed kernels from nested loops. It provides a high-level abstraction used to decompose the iteration space of loop nests. Its directives-based language enables an effective and efficient placement of multi-dimensional computations on the 3D topology of accelerators (e.g. graphics processing units, GPUs). We implemented both the tile abstraction and the kernel generator in ROSE Compiler. We used TileK to parallelize linear algebra kernels and stencils, targeting multicore CPUs (pThread) and GPUs (OpenCL). TileK enabled us to explore and evaluate a large optimization space of many versions of these kernels for varying input sizes. Our results shows that the selection of a given optimization for a specific input size is a challenging problem.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the General Purpose GPUs

自引率

0.00%

发文量