{"title":"Fusing the Polyhedral and Tensor Compilers to Accelerate Scientific Computing Kernels","authors":"Qingzhi Liu, Changbo Chen, Hanwen Dai","doi":"10.1002/cpe.70164","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>Polyhedral compilers and tensor compilers have achieved great success on accelerating scientific computing kernels and deep learning networks, respectively. Although much work has been done to integrate techniques of the polyhedral model to tensor compilers for accelerating deep learning, leveraging the powerful auto-tuning ability of modern tensor compilers to accelerate more general scientific computing kernels is challenging and is still at its dawn. In this work, we introduce a method to accelerate a family of basic scientific computing kernels by fusing the polyhedral compiler Pluto and the tensor compiler Tensor Virtual Machine (TVM) to generate efficient implementations targeting the heterogeneous CPU/GPU platform. The fusion is done in four steps: building a polyhedral model for the loop description of a given scientific kernel; designing schedules to transform the polyhedral model to new ones to enable rectangular tiling and expose explicit parallelism; selecting a new polyhedral model and converting it to the tensor compute representation; auto-tuning the tensor compute to generate efficient implementations on both CPUs and GPUs. Shifting and padding optimizations are also considered to avoid conditionals. Experiments on 30 typical scientific computing kernels show that our method achieves <span></span><math>\n <semantics>\n <mrow>\n <mn>3</mn>\n <mo>.</mo>\n <mn>31</mn>\n <mo>×</mo>\n </mrow>\n <annotation>$$ 3.31\\times $$</annotation>\n </semantics></math> speedup on average over a typical polyhedral compiler PPCG on GPU.</p>\n </div>","PeriodicalId":55214,"journal":{"name":"Concurrency and Computation-Practice & Experience","volume":"37 15-17","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2025-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Concurrency and Computation-Practice & Experience","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cpe.70164","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0
Abstract
Polyhedral compilers and tensor compilers have achieved great success on accelerating scientific computing kernels and deep learning networks, respectively. Although much work has been done to integrate techniques of the polyhedral model to tensor compilers for accelerating deep learning, leveraging the powerful auto-tuning ability of modern tensor compilers to accelerate more general scientific computing kernels is challenging and is still at its dawn. In this work, we introduce a method to accelerate a family of basic scientific computing kernels by fusing the polyhedral compiler Pluto and the tensor compiler Tensor Virtual Machine (TVM) to generate efficient implementations targeting the heterogeneous CPU/GPU platform. The fusion is done in four steps: building a polyhedral model for the loop description of a given scientific kernel; designing schedules to transform the polyhedral model to new ones to enable rectangular tiling and expose explicit parallelism; selecting a new polyhedral model and converting it to the tensor compute representation; auto-tuning the tensor compute to generate efficient implementations on both CPUs and GPUs. Shifting and padding optimizations are also considered to avoid conditionals. Experiments on 30 typical scientific computing kernels show that our method achieves speedup on average over a typical polyhedral compiler PPCG on GPU.
期刊介绍:
Concurrency and Computation: Practice and Experience (CCPE) publishes high-quality, original research papers, and authoritative research review papers, in the overlapping fields of:
Parallel and distributed computing;
High-performance computing;
Computational and data science;
Artificial intelligence and machine learning;
Big data applications, algorithms, and systems;
Network science;
Ontologies and semantics;
Security and privacy;
Cloud/edge/fog computing;
Green computing; and
Quantum computing.