基于fpga的可扩展混合硬件平台的统一opencl风格编程模型

2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14) Pub Date : 2014-12-01 DOI:10.1109/ReConFig.2014.7032563

Hongyuan Ding, Miaoqing Huang

{"title":"基于fpga的可扩展混合硬件平台的统一opencl风格编程模型","authors":"Hongyuan Ding, Miaoqing Huang","doi":"10.1109/ReConFig.2014.7032563","DOIUrl":null,"url":null,"abstract":"Hardware accelerators are capable of achieving significant performance improvement. However, designing hardware accelerators lacks the flexibility and the productivity. Combining hardware accelerators with multiprocessor system-on-chip (MPSoC) is an alternative way to balance the flexibility, the productivity, and the performance. In this work, we present a unified hybrid OpenCL-flavor (HOpenCL) parallel programming model on MPSoC supporting both hardware and software kernels. By integrating the HOpenCL hardware IPs and software libraries, the same kernel function can execute as either hardware kernels on the dedicated hardware accelerators or software kernels on the general-purpose processors. Using the automatic design flow, the corresponding hybrid hardware platform is generated along with the executable. We use the matrix multiplication of 512×512 to examine the potential of our hybrid system in terms of performance, scalability, and productivity. The results show that hardware kernels reach more than 10 times speedup compared with the software kernels. Our prototype platform also demonstrates a good performance scalability when the number of group computation units (GCUs) increases from 1 to 6 until it becomes a memory bound problem. Compared with the hard ARM core on the Zynq 7045 device, we find that the performance of one ARM core is equivalent to 2 or 3 GCUs with software kernel implementations. On the other hand, a single GCU with hardware kernel implementation is 5 times faster than the ARM core.","PeriodicalId":137331,"journal":{"name":"2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":"{\"title\":\"A unified OpenCL-flavor programming model with scalable hybrid hardware platform on FPGAs\",\"authors\":\"Hongyuan Ding, Miaoqing Huang\",\"doi\":\"10.1109/ReConFig.2014.7032563\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Hardware accelerators are capable of achieving significant performance improvement. However, designing hardware accelerators lacks the flexibility and the productivity. Combining hardware accelerators with multiprocessor system-on-chip (MPSoC) is an alternative way to balance the flexibility, the productivity, and the performance. In this work, we present a unified hybrid OpenCL-flavor (HOpenCL) parallel programming model on MPSoC supporting both hardware and software kernels. By integrating the HOpenCL hardware IPs and software libraries, the same kernel function can execute as either hardware kernels on the dedicated hardware accelerators or software kernels on the general-purpose processors. Using the automatic design flow, the corresponding hybrid hardware platform is generated along with the executable. We use the matrix multiplication of 512×512 to examine the potential of our hybrid system in terms of performance, scalability, and productivity. The results show that hardware kernels reach more than 10 times speedup compared with the software kernels. Our prototype platform also demonstrates a good performance scalability when the number of group computation units (GCUs) increases from 1 to 6 until it becomes a memory bound problem. Compared with the hard ARM core on the Zynq 7045 device, we find that the performance of one ARM core is equivalent to 2 or 3 GCUs with software kernel implementations. On the other hand, a single GCU with hardware kernel implementation is 5 times faster than the ARM core.\",\"PeriodicalId\":137331,\"journal\":{\"name\":\"2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"10\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ReConFig.2014.7032563\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ReConFig.2014.7032563","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 10

摘要

硬件加速器能够实现显著的性能改进。然而，硬件加速器的设计缺乏灵活性和生产力。将硬件加速器与多处理器片上系统(MPSoC)相结合是平衡灵活性、生产力和性能的另一种方法。在这项工作中，我们提出了一个统一的混合opencl风格(HOpenCL)并行编程模型在MPSoC上支持硬件和软件内核。通过集成HOpenCL硬件ip和软件库，相同的内核函数既可以作为专用硬件加速器上的硬件内核执行，也可以作为通用处理器上的软件内核执行。利用自动设计流程，生成相应的混合硬件平台和可执行文件。我们使用512×512的矩阵乘法来检查混合系统在性能、可伸缩性和生产力方面的潜力。结果表明，与软件内核相比，硬件内核的速度提高了10倍以上。当组计算单元(gcu)的数量从1增加到6时，我们的原型平台还展示了良好的性能可伸缩性，直到它成为内存限制问题。与Zynq 7045设备上的硬ARM内核相比，我们发现一个ARM内核的性能相当于2或3个软件内核实现的gpu。另一方面，单个GCU的硬件内核实现比ARM内核快5倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A unified OpenCL-flavor programming model with scalable hybrid hardware platform on FPGAs

Hardware accelerators are capable of achieving significant performance improvement. However, designing hardware accelerators lacks the flexibility and the productivity. Combining hardware accelerators with multiprocessor system-on-chip (MPSoC) is an alternative way to balance the flexibility, the productivity, and the performance. In this work, we present a unified hybrid OpenCL-flavor (HOpenCL) parallel programming model on MPSoC supporting both hardware and software kernels. By integrating the HOpenCL hardware IPs and software libraries, the same kernel function can execute as either hardware kernels on the dedicated hardware accelerators or software kernels on the general-purpose processors. Using the automatic design flow, the corresponding hybrid hardware platform is generated along with the executable. We use the matrix multiplication of 512×512 to examine the potential of our hybrid system in terms of performance, scalability, and productivity. The results show that hardware kernels reach more than 10 times speedup compared with the software kernels. Our prototype platform also demonstrates a good performance scalability when the number of group computation units (GCUs) increases from 1 to 6 until it becomes a memory bound problem. Compared with the hard ARM core on the Zynq 7045 device, we find that the performance of one ARM core is equivalent to 2 or 3 GCUs with software kernel implementations. On the other hand, a single GCU with hardware kernel implementation is 5 times faster than the ARM core.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14)

自引率

0.00%

发文量