Zafar Ahmad, M. Javanmard, Gregory Croisdale, Aaron Gregory, P. Ganapathi, L. Pouchet, R. Chowdhury
{"title":"FOURST: A code generator for FFT-based fast stencil computations","authors":"Zafar Ahmad, M. Javanmard, Gregory Croisdale, Aaron Gregory, P. Ganapathi, L. Pouchet, R. Chowdhury","doi":"10.1109/ispass55109.2022.00010","DOIUrl":null,"url":null,"abstract":"Stencil computations are ubiquitous in modern grid-based physical simulations. In this paper, we present FOURST – a compiler to generate programs computing time iterated linear periodic and aperiodic stencil computations with fast Fourier transform methods. This paper outlines the design and implementation of the code generation approach in FOURST, to automatically generate FFT-based stencil solvers. We present experimental results on the state-of-the-art Ookami supercomputer housing Fujitsu A64FX and Intel Skylake processors, to study the performance of FOURST and a state-of-the-art tiling-based optimized code generator PLuTo on various stencil shapes and varying the number of time iterations. We discuss the performance profiles, and limitations, of both approaches on high-end modern hardware.","PeriodicalId":115391,"journal":{"name":"2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ispass55109.2022.00010","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Stencil computations are ubiquitous in modern grid-based physical simulations. In this paper, we present FOURST – a compiler to generate programs computing time iterated linear periodic and aperiodic stencil computations with fast Fourier transform methods. This paper outlines the design and implementation of the code generation approach in FOURST, to automatically generate FFT-based stencil solvers. We present experimental results on the state-of-the-art Ookami supercomputer housing Fujitsu A64FX and Intel Skylake processors, to study the performance of FOURST and a state-of-the-art tiling-based optimized code generator PLuTo on various stencil shapes and varying the number of time iterations. We discuss the performance profiles, and limitations, of both approaches on high-end modern hardware.