Towards a Domain-Extensible Compiler: Optimizing an Image Processing Pipeline on Mobile CPUs

2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) Pub Date : 2021-02-27 DOI:10.1109/CGO51591.2021.9370337

T. Koehler, Michel Steuwer

{"title":"Towards a Domain-Extensible Compiler: Optimizing an Image Processing Pipeline on Mobile CPUs","authors":"T. Koehler, Michel Steuwer","doi":"10.1109/CGO51591.2021.9370337","DOIUrl":null,"url":null,"abstract":"Halide and many similar projects have demonstrated the great potential of domain specific optimizing compilers. They enable programs to be expressed at a convenient high-level, while generating high-performance code for parallel architectures. As domains of interest expand towards deep learning, probabilistic programming and beyond, it becomes increasingly clear that it is unsustainable to redesign domain specific compilers for each new domain. In addition, the rapid growth of hardware architectures to optimize for poses great challenges for designing these compilers. In this paper, we show how to extend a unifying domain-extensible compiler with domain-specific as well as hardware-specific optimizations. The compiler operates on generic patterns that have proven flexible enough to express a wide range of computations. Optimizations are not hard-coded into the compiler but are expressed as user-defined rewrite rules that are composed into strategies controlling the optimization process. Crucially, both computational patterns and optimization strategies are extensible without modifying the core compiler implementation. We demonstrate that this domain-extensible compiler design is capable of expressing image processing pipelines and well-known image processing optimizations. Our results on four mobile ARM multi-core CPUs, often used for image processing tasks, show that the code generated for the Harris operator outperforms the image processing library OpenCV by up to 16× and achieves performance close to - or even up to 1.4 × better than - the state-of-the-art image processing compiler Halide.","PeriodicalId":275062,"journal":{"name":"2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CGO51591.2021.9370337","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

Halide and many similar projects have demonstrated the great potential of domain specific optimizing compilers. They enable programs to be expressed at a convenient high-level, while generating high-performance code for parallel architectures. As domains of interest expand towards deep learning, probabilistic programming and beyond, it becomes increasingly clear that it is unsustainable to redesign domain specific compilers for each new domain. In addition, the rapid growth of hardware architectures to optimize for poses great challenges for designing these compilers. In this paper, we show how to extend a unifying domain-extensible compiler with domain-specific as well as hardware-specific optimizations. The compiler operates on generic patterns that have proven flexible enough to express a wide range of computations. Optimizations are not hard-coded into the compiler but are expressed as user-defined rewrite rules that are composed into strategies controlling the optimization process. Crucially, both computational patterns and optimization strategies are extensible without modifying the core compiler implementation. We demonstrate that this domain-extensible compiler design is capable of expressing image processing pipelines and well-known image processing optimizations. Our results on four mobile ARM multi-core CPUs, often used for image processing tasks, show that the code generated for the Harris operator outperforms the image processing library OpenCV by up to 16× and achieves performance close to - or even up to 1.4 × better than - the state-of-the-art image processing compiler Halide.

查看原文本刊更多论文

面向领域可扩展编译器:优化移动cpu上的图像处理管道

Halide和许多类似的项目已经证明了特定领域优化编译器的巨大潜力。它们使程序能够在方便的高层表达，同时为并行体系结构生成高性能代码。随着兴趣领域向深度学习、概率编程等领域扩展，越来越明显的是，为每个新领域重新设计特定于领域的编译器是不可持续的。此外，需要优化的硬件体系结构的快速增长对设计这些编译器提出了巨大的挑战。在本文中，我们展示了如何使用特定于领域和特定于硬件的优化来扩展统一的领域可扩展编译器。编译器对通用模式进行操作，这些模式已被证明足够灵活，可以表达广泛的计算。优化不是硬编码到编译器中，而是表示为用户定义的重写规则，这些规则组成控制优化过程的策略。关键是，计算模式和优化策略都是可扩展的，而无需修改核心编译器实现。我们证明了这种领域可扩展的编译器设计能够表达图像处理管道和众所周知的图像处理优化。我们在四个移动ARM多核cpu(通常用于图像处理任务)上的结果表明，为Harris算子生成的代码比图像处理库OpenCV的性能高出16倍，性能接近甚至比最先进的图像处理编译器Halide高出1.4倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

自引率

0.00%

发文量