FPGA加速器中编译器优化的性能评估

Anais do Simpósio em Sistemas Computacionais de Alto Desempenho (WSCAD) Pub Date : 2019-11-08 DOI:10.5753/wscad.2019.8681

Gustavo Leite, A. Baldassin, G. Araújo, J. N. Amaral

{"title":"FPGA加速器中编译器优化的性能评估","authors":"Gustavo Leite, A. Baldassin, G. Araújo, J. N. Amaral","doi":"10.5753/wscad.2019.8681","DOIUrl":null,"url":null,"abstract":"With the increasing power wall in microprocessor design, engineers shifted their attention to heterogeneous architectures, wherein several classes of devices are used for computation. Among them are FPGAs which offer comparable performance to CPUs while consuming only a fraction of energy. Despite the increasing interest in these devices, programmability and performance engineering in FPGAs remain hard. This work presents an evaluation of the most prominent code transformations targeting FPGAs. More specifically, it studies the performance effect of unrolling loops, replicating compute units and transferring data using DMA in a matrix multiplication OpenCL kernel through an Intel® FPGA. The results indicate that these optimizations can achieve speedups up to 3.78× for a matrix multiplication application, and 412.5× speedup in data transfer.","PeriodicalId":117711,"journal":{"name":"Anais do Simpósio em Sistemas Computacionais de Alto Desempenho (WSCAD)","volume":"2007 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Performance Evaluation of Compiler Optimizations in FPGA Accelerators\",\"authors\":\"Gustavo Leite, A. Baldassin, G. Araújo, J. N. Amaral\",\"doi\":\"10.5753/wscad.2019.8681\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the increasing power wall in microprocessor design, engineers shifted their attention to heterogeneous architectures, wherein several classes of devices are used for computation. Among them are FPGAs which offer comparable performance to CPUs while consuming only a fraction of energy. Despite the increasing interest in these devices, programmability and performance engineering in FPGAs remain hard. This work presents an evaluation of the most prominent code transformations targeting FPGAs. More specifically, it studies the performance effect of unrolling loops, replicating compute units and transferring data using DMA in a matrix multiplication OpenCL kernel through an Intel® FPGA. The results indicate that these optimizations can achieve speedups up to 3.78× for a matrix multiplication application, and 412.5× speedup in data transfer.\",\"PeriodicalId\":117711,\"journal\":{\"name\":\"Anais do Simpósio em Sistemas Computacionais de Alto Desempenho (WSCAD)\",\"volume\":\"2007 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-11-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Anais do Simpósio em Sistemas Computacionais de Alto Desempenho (WSCAD)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5753/wscad.2019.8681\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Anais do Simpósio em Sistemas Computacionais de Alto Desempenho (WSCAD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5753/wscad.2019.8681","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

随着微处理器设计中功率墙的增加，工程师们将注意力转移到异构架构上，其中几种类型的设备用于计算。其中包括fpga，它提供与cpu相当的性能，同时只消耗一小部分能量。尽管人们对这些器件越来越感兴趣，但fpga的可编程性和性能工程仍然很困难。这项工作提出了针对fpga的最突出的代码转换的评估。更具体地说，它研究了通过Intel®FPGA在矩阵乘法OpenCL内核中使用DMA展开循环、复制计算单元和传输数据的性能影响。结果表明，对于矩阵乘法应用程序，这些优化可以实现高达3.78倍的加速，在数据传输方面可以实现412.5倍的加速。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Performance Evaluation of Compiler Optimizations in FPGA Accelerators

With the increasing power wall in microprocessor design, engineers shifted their attention to heterogeneous architectures, wherein several classes of devices are used for computation. Among them are FPGAs which offer comparable performance to CPUs while consuming only a fraction of energy. Despite the increasing interest in these devices, programmability and performance engineering in FPGAs remain hard. This work presents an evaluation of the most prominent code transformations targeting FPGAs. More specifically, it studies the performance effect of unrolling loops, replicating compute units and transferring data using DMA in a matrix multiplication OpenCL kernel through an Intel® FPGA. The results indicate that these optimizations can achieve speedups up to 3.78× for a matrix multiplication application, and 412.5× speedup in data transfer.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Anais do Simpósio em Sistemas Computacionais de Alto Desempenho (WSCAD)

自引率

0.00%

发文量