通过FFT特性在cpu上生成快速FFT内核

Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming Pub Date : 2023-02-25 DOI:10.1145/3572848.3577477

Zhihao Li, Haipeng Jia, Yunquan Zhang, Yuyan Sun, Yiwei Zhang, Tun Chen

{"title":"通过FFT特性在cpu上生成快速FFT内核","authors":"Zhihao Li, Haipeng Jia, Yunquan Zhang, Yuyan Sun, Yiwei Zhang, Tun Chen","doi":"10.1145/3572848.3577477","DOIUrl":null,"url":null,"abstract":"This paper proposes an algorithm-specific instruction (ASI)-based fast Fourier transform (FFT) code generation framework, named FFTASI, to generate unified architecture independent butterfly kernels that can be transformed into architecture-dependent kernels by establishing the mapping between ASIs and architecture-specific instructions for various hardware platforms. FFTASI strikes a good balance between performance and productivity on CPUs.","PeriodicalId":233744,"journal":{"name":"Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming","volume":"73 43","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Generating Fast FFT Kernels on CPUs via FFT-Specific Intrinsics\",\"authors\":\"Zhihao Li, Haipeng Jia, Yunquan Zhang, Yuyan Sun, Yiwei Zhang, Tun Chen\",\"doi\":\"10.1145/3572848.3577477\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper proposes an algorithm-specific instruction (ASI)-based fast Fourier transform (FFT) code generation framework, named FFTASI, to generate unified architecture independent butterfly kernels that can be transformed into architecture-dependent kernels by establishing the mapping between ASIs and architecture-specific instructions for various hardware platforms. FFTASI strikes a good balance between performance and productivity on CPUs.\",\"PeriodicalId\":233744,\"journal\":{\"name\":\"Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming\",\"volume\":\"73 43\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-02-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3572848.3577477\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3572848.3577477","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

本文提出了一种基于算法专用指令(ASI)的快速傅立叶变换(FFT)代码生成框架，命名为FFTASI，通过建立不同硬件平台的ASIs与特定架构指令之间的映射关系，生成统一的体系结构无关的蝴蝶核，并将蝴蝶核转换为体系结构相关的核。FFTASI在cpu的性能和生产力之间取得了很好的平衡。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Generating Fast FFT Kernels on CPUs via FFT-Specific Intrinsics

This paper proposes an algorithm-specific instruction (ASI)-based fast Fourier transform (FFT) code generation framework, named FFTASI, to generate unified architecture independent butterfly kernels that can be transformed into architecture-dependent kernels by establishing the mapping between ASIs and architecture-specific instructions for various hardware platforms. FFTASI strikes a good balance between performance and productivity on CPUs.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming

自引率

0.00%

发文量