T. Vanderbruggen, John Cavazos, C. Liao, D. Quinlan
{"title":"基于指令的贴图抽象,在加速器上分配循环","authors":"T. Vanderbruggen, John Cavazos, C. Liao, D. Quinlan","doi":"10.1145/3038228.3038238","DOIUrl":null,"url":null,"abstract":"Optimizing applications for the next generation of super-computers requires next generation compilers. These compilers need to provide an abstraction for the developer to describe the inner working of applications. And, next generation compilers need to be able to intelligently apply optimizations to a wide variety of algorithms solved by scientific applications. They need to optimize applications for any workload targeting any architecture. In this paper, we present an important component of any next generation supercomputer compiler that we call TileK. TileK is a tile abstraction used to generate distributed kernels from nested loops. It provides a high-level abstraction used to decompose the iteration space of loop nests. Its directives-based language enables an effective and efficient placement of multi-dimensional computations on the 3D topology of accelerators (e.g. graphics processing units, GPUs). We implemented both the tile abstraction and the kernel generator in ROSE Compiler. We used TileK to parallelize linear algebra kernels and stencils, targeting multicore CPUs (pThread) and GPUs (OpenCL). TileK enabled us to explore and evaluate a large optimization space of many versions of these kernels for varying input sizes. Our results shows that the selection of a given optimization for a specific input size is a challenging problem.","PeriodicalId":108772,"journal":{"name":"Proceedings of the General Purpose GPUs","volume":"98 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Directive-based tile abstraction to distribute loops on accelerators\",\"authors\":\"T. Vanderbruggen, John Cavazos, C. Liao, D. Quinlan\",\"doi\":\"10.1145/3038228.3038238\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Optimizing applications for the next generation of super-computers requires next generation compilers. These compilers need to provide an abstraction for the developer to describe the inner working of applications. And, next generation compilers need to be able to intelligently apply optimizations to a wide variety of algorithms solved by scientific applications. They need to optimize applications for any workload targeting any architecture. In this paper, we present an important component of any next generation supercomputer compiler that we call TileK. TileK is a tile abstraction used to generate distributed kernels from nested loops. It provides a high-level abstraction used to decompose the iteration space of loop nests. Its directives-based language enables an effective and efficient placement of multi-dimensional computations on the 3D topology of accelerators (e.g. graphics processing units, GPUs). We implemented both the tile abstraction and the kernel generator in ROSE Compiler. We used TileK to parallelize linear algebra kernels and stencils, targeting multicore CPUs (pThread) and GPUs (OpenCL). TileK enabled us to explore and evaluate a large optimization space of many versions of these kernels for varying input sizes. Our results shows that the selection of a given optimization for a specific input size is a challenging problem.\",\"PeriodicalId\":108772,\"journal\":{\"name\":\"Proceedings of the General Purpose GPUs\",\"volume\":\"98 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-02-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the General Purpose GPUs\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3038228.3038238\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the General Purpose GPUs","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3038228.3038238","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Directive-based tile abstraction to distribute loops on accelerators
Optimizing applications for the next generation of super-computers requires next generation compilers. These compilers need to provide an abstraction for the developer to describe the inner working of applications. And, next generation compilers need to be able to intelligently apply optimizations to a wide variety of algorithms solved by scientific applications. They need to optimize applications for any workload targeting any architecture. In this paper, we present an important component of any next generation supercomputer compiler that we call TileK. TileK is a tile abstraction used to generate distributed kernels from nested loops. It provides a high-level abstraction used to decompose the iteration space of loop nests. Its directives-based language enables an effective and efficient placement of multi-dimensional computations on the 3D topology of accelerators (e.g. graphics processing units, GPUs). We implemented both the tile abstraction and the kernel generator in ROSE Compiler. We used TileK to parallelize linear algebra kernels and stencils, targeting multicore CPUs (pThread) and GPUs (OpenCL). TileK enabled us to explore and evaluate a large optimization space of many versions of these kernels for varying input sizes. Our results shows that the selection of a given optimization for a specific input size is a challenging problem.