SVE上的FFTE:螺旋生成内核

Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region Pub Date : 2020-01-15 DOI:10.1145/3368474.3368488

D. Takahashi, F. Franchetti

{"title":"SVE上的FFTE:螺旋生成内核","authors":"D. Takahashi, F. Franchetti","doi":"10.1145/3368474.3368488","DOIUrl":null,"url":null,"abstract":"In this paper we propose an implementation of the fast Fourier transform (FFT) targeting the ARM Scalable Vector Extension (SVE). We performed automatic vectorization via a compiler and an explicit vectorization through code generation by SPIRAL for FFT kernels, and compared the performance. We show that the explicit vectorization of SPIRAL generated code improves performance significantly. Performance results of FFTs on RIKEN's Fugaku processor simulator are reported. With the ARM compiler SPIRAL-generated FFT kernels written in SVE intrinsic are up to 3.16 times faster than FFT kernels of FFTE written in Fortran and up to 5.62 times faster than SPIRAL-generated FFT kernels written in C.","PeriodicalId":314778,"journal":{"name":"Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region","volume":"40 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"FFTE on SVE: SPIRAL-Generated Kernels\",\"authors\":\"D. Takahashi, F. Franchetti\",\"doi\":\"10.1145/3368474.3368488\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper we propose an implementation of the fast Fourier transform (FFT) targeting the ARM Scalable Vector Extension (SVE). We performed automatic vectorization via a compiler and an explicit vectorization through code generation by SPIRAL for FFT kernels, and compared the performance. We show that the explicit vectorization of SPIRAL generated code improves performance significantly. Performance results of FFTs on RIKEN's Fugaku processor simulator are reported. With the ARM compiler SPIRAL-generated FFT kernels written in SVE intrinsic are up to 3.16 times faster than FFT kernels of FFTE written in Fortran and up to 5.62 times faster than SPIRAL-generated FFT kernels written in C.\",\"PeriodicalId\":314778,\"journal\":{\"name\":\"Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region\",\"volume\":\"40 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-01-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3368474.3368488\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3368474.3368488","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

本文提出了一种针对ARM可扩展向量扩展(SVE)的快速傅里叶变换(FFT)实现方法。我们通过编译器进行了自动向量化，并通过螺旋生成代码对FFT内核进行了显式向量化，并比较了性能。我们证明了螺旋生成代码的显式向量化显著提高了性能。本文报道了FFTs在RIKEN的Fugaku处理器模拟器上的性能结果。使用ARM编译器，用SVE编写的螺旋生成FFT内核比用Fortran编写的FFTE的FFT内核快3.16倍，比用C编写的螺旋生成FFT内核快5.62倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

FFTE on SVE: SPIRAL-Generated Kernels

In this paper we propose an implementation of the fast Fourier transform (FFT) targeting the ARM Scalable Vector Extension (SVE). We performed automatic vectorization via a compiler and an explicit vectorization through code generation by SPIRAL for FFT kernels, and compared the performance. We show that the explicit vectorization of SPIRAL generated code improves performance significantly. Performance results of FFTs on RIKEN's Fugaku processor simulator are reported. With the ARM compiler SPIRAL-generated FFT kernels written in SVE intrinsic are up to 3.16 times faster than FFT kernels of FFTE written in Fortran and up to 5.62 times faster than SPIRAL-generated FFT kernels written in C.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region

自引率

0.00%

发文量