An Extensible Framework for Composing Stencils with Common Scientific Computing Patterns

Proceedings of the Second Workshop on Optimizing Stencil Computations Pub Date : 2014-10-20 DOI:10.1145/2686745.2686750

L. Truong, Chick Markley, A. Fox

{"title":"An Extensible Framework for Composing Stencils with Common Scientific Computing Patterns","authors":"L. Truong, Chick Markley, A. Fox","doi":"10.1145/2686745.2686750","DOIUrl":null,"url":null,"abstract":"The SEJITS framework supports creating embedded domain-specific languages (DSELs) and code generators, a pair of which is called a specializer, with much less effort than creating a full DSL compiler---typically just a few hundred lines of code. SEJITS' main benefit is allowing application writers to stay entirely in high-level languages such as Python by using specialized Python functions (that is, functions written in one of the Python-embedded DSELs) to generate code that runs at native speed. One existing SEJITS DSEL is Sepya [10], a Python DSEL for stencil computations that generates OpenMP and Cilk+ code competitive with existing DSL compilers such as Pochoir and Halide. We extend Sepya to generate OpenCL code for targetting GPUs, and in the process, extend SEJITS with support for meta-specializers, whose job is to enable and optimize the composition of existing specializers written by third parties. In this work, we demonstrate meta-specialization by detecting and removing extraneous data copies to and from the GPU to compose multiple specializer calls (stencil and non-stencil). We also explore the variants of loop fusion to further improve performance of composing these operations. The performance of the generated stencil code is 20x faster SciPy and competitive with existing stencil DSELs on realistic code excerpts. Since meta-specializers must compose and optimize specializers created by third parties, we extend SEJITS with support for meta-specializer hooks, allowing existing specializers to be incrementally enabled for meta-specialization without breaking backwards compatibility. The Sepya and SEJITS extensions together extend the range of platforms for which highly optimized code can be generated and open new possibilities for optimizing the composition of existing specializers.","PeriodicalId":367066,"journal":{"name":"Proceedings of the Second Workshop on Optimizing Stencil Computations","volume":"41 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Second Workshop on Optimizing Stencil Computations","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2686745.2686750","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

The SEJITS framework supports creating embedded domain-specific languages (DSELs) and code generators, a pair of which is called a specializer, with much less effort than creating a full DSL compiler---typically just a few hundred lines of code. SEJITS' main benefit is allowing application writers to stay entirely in high-level languages such as Python by using specialized Python functions (that is, functions written in one of the Python-embedded DSELs) to generate code that runs at native speed. One existing SEJITS DSEL is Sepya [10], a Python DSEL for stencil computations that generates OpenMP and Cilk+ code competitive with existing DSL compilers such as Pochoir and Halide. We extend Sepya to generate OpenCL code for targetting GPUs, and in the process, extend SEJITS with support for meta-specializers, whose job is to enable and optimize the composition of existing specializers written by third parties. In this work, we demonstrate meta-specialization by detecting and removing extraneous data copies to and from the GPU to compose multiple specializer calls (stencil and non-stencil). We also explore the variants of loop fusion to further improve performance of composing these operations. The performance of the generated stencil code is 20x faster SciPy and competitive with existing stencil DSELs on realistic code excerpts. Since meta-specializers must compose and optimize specializers created by third parties, we extend SEJITS with support for meta-specializer hooks, allowing existing specializers to be incrementally enabled for meta-specialization without breaking backwards compatibility. The Sepya and SEJITS extensions together extend the range of platforms for which highly optimized code can be generated and open new possibilities for optimizing the composition of existing specializers.

查看原文本刊更多论文

用通用科学计算模式组合模板的可扩展框架

SEJITS框架支持创建嵌入式特定于领域的语言(dsel)和代码生成器，其中的一对称为专门化器，比创建完整的DSL编译器要省力得多——通常只有几百行代码。SEJITS的主要优点是允许应用程序编写人员完全使用Python等高级语言，通过使用专门的Python函数(即用Python嵌入的dsel之一编写的函数)来生成以本机速度运行的代码。一个现有的SEJITS DSEL是Sepya[10]，这是一个用于模板计算的Python DSEL，它生成OpenMP和Cilk+代码，与现有的DSL编译器(如Pochoir和Halide)竞争。我们扩展Sepya以生成针对gpu的OpenCL代码，并在此过程中扩展SEJITS以支持元专门化器，其工作是启用和优化由第三方编写的现有专门化器的组成。在这项工作中，我们通过检测和删除来自GPU的无关数据副本来组成多个专门化调用(模板和非模板)来演示元专门化。我们还探讨了循环融合的变体，以进一步提高组合这些操作的性能。生成的模板代码的性能比SciPy快20倍，并且在实际代码摘录上与现有的模板dsel竞争。由于元专门化器必须组合和优化由第三方创建的专门化器，我们扩展了SEJITS，支持元专门化器挂钩，允许现有的专门化器在不破坏向后兼容性的情况下增量地启用元专门化。Sepya和SEJITS扩展一起扩展了可以生成高度优化代码的平台范围，并为优化现有专门化程序的组合开辟了新的可能性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the Second Workshop on Optimizing Stencil Computations

自引率

0.00%

发文量