{"title":"基于OpenCL FPGA平台的径向基函数核评估","authors":"Zheming Jin, H. Finkel","doi":"10.1109/IGCC.2018.8752172","DOIUrl":null,"url":null,"abstract":"Field-programmable gate arrays (FPGAs) are becoming a promising heterogeneous computing component for scientific computing when floating-point optimized architectures are added to the current FPGAs. The emerging high-level synthesis (HLS) tools provide a streamlined design flow to facilitate the use of FPGAs for researchers who have little FPGA development experience. In this paper, we choose the kernel, Radial Basis Function, in a support vector machine as a case study to evaluate the potential of implementing machine learning kernels on FPGAs, and the capabilities of an HLS tool to convert a kernel written in high-level language to an FPGA implementation. We explain the HLS flow and the RBF kernel. We evaluate the kernel in an OpenCL-to-FPGA HLS flow, and describe the optimizations of the kernel. Our optimizations using kernel vectorization and loop unrolling improve the kernel performance by a factor of 15.8 compared to a baseline kernel on the Nallatech 385A FPGA card that features an Intel Arria 10 GX 1150 FPGA. In terms of energy efficiency, the performance per watt on the FPGA platform is 2.8X higher than that on an Intel Xeon 16-core CPU, and 1.7X higher than that on an Nvidia Tesla K80 GPU. On the other hand, the performance per watt on an Intel Xeon Phi Knights Landing CPU and an Nvidia Tesla P100 GPU are 5.3X and 1.7X higher than that on the FPGA, respectively.","PeriodicalId":388554,"journal":{"name":"2018 Ninth International Green and Sustainable Computing Conference (IGSC)","volume":"69 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Evaluating Radial Basis Function Kernel on OpenCL FPGA Platform\",\"authors\":\"Zheming Jin, H. Finkel\",\"doi\":\"10.1109/IGCC.2018.8752172\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Field-programmable gate arrays (FPGAs) are becoming a promising heterogeneous computing component for scientific computing when floating-point optimized architectures are added to the current FPGAs. The emerging high-level synthesis (HLS) tools provide a streamlined design flow to facilitate the use of FPGAs for researchers who have little FPGA development experience. In this paper, we choose the kernel, Radial Basis Function, in a support vector machine as a case study to evaluate the potential of implementing machine learning kernels on FPGAs, and the capabilities of an HLS tool to convert a kernel written in high-level language to an FPGA implementation. We explain the HLS flow and the RBF kernel. We evaluate the kernel in an OpenCL-to-FPGA HLS flow, and describe the optimizations of the kernel. Our optimizations using kernel vectorization and loop unrolling improve the kernel performance by a factor of 15.8 compared to a baseline kernel on the Nallatech 385A FPGA card that features an Intel Arria 10 GX 1150 FPGA. In terms of energy efficiency, the performance per watt on the FPGA platform is 2.8X higher than that on an Intel Xeon 16-core CPU, and 1.7X higher than that on an Nvidia Tesla K80 GPU. On the other hand, the performance per watt on an Intel Xeon Phi Knights Landing CPU and an Nvidia Tesla P100 GPU are 5.3X and 1.7X higher than that on the FPGA, respectively.\",\"PeriodicalId\":388554,\"journal\":{\"name\":\"2018 Ninth International Green and Sustainable Computing Conference (IGSC)\",\"volume\":\"69 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 Ninth International Green and Sustainable Computing Conference (IGSC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IGCC.2018.8752172\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 Ninth International Green and Sustainable Computing Conference (IGSC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IGCC.2018.8752172","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
摘要
随着浮点优化架构的加入,现场可编程门阵列(fpga)正在成为一种很有前途的科学计算异构计算组件。新兴的高级综合(HLS)工具提供了一个简化的设计流程,以方便FPGA开发经验较少的研究人员使用FPGA。在本文中,我们选择支持向量机中的内核径向基函数作为案例研究,以评估在FPGA上实现机器学习内核的潜力,以及HLS工具将用高级语言编写的内核转换为FPGA实现的能力。我们解释了HLS流和RBF内核。我们在opencl到fpga的HLS流程中评估内核,并描述内核的优化。与采用Intel Arria 10 GX 1150 FPGA的Nallatech 385A FPGA卡上的基准内核相比,我们使用内核矢量化和循环展开进行的优化将内核性能提高了15.8倍。在能效方面,FPGA平台的每瓦性能比Intel至强16核CPU高2.8倍,比Nvidia Tesla K80 GPU高1.7倍。另一方面,Intel Xeon Phi Knights Landing CPU和Nvidia Tesla P100 GPU的每瓦性能分别比FPGA高5.3倍和1.7倍。
Evaluating Radial Basis Function Kernel on OpenCL FPGA Platform
Field-programmable gate arrays (FPGAs) are becoming a promising heterogeneous computing component for scientific computing when floating-point optimized architectures are added to the current FPGAs. The emerging high-level synthesis (HLS) tools provide a streamlined design flow to facilitate the use of FPGAs for researchers who have little FPGA development experience. In this paper, we choose the kernel, Radial Basis Function, in a support vector machine as a case study to evaluate the potential of implementing machine learning kernels on FPGAs, and the capabilities of an HLS tool to convert a kernel written in high-level language to an FPGA implementation. We explain the HLS flow and the RBF kernel. We evaluate the kernel in an OpenCL-to-FPGA HLS flow, and describe the optimizations of the kernel. Our optimizations using kernel vectorization and loop unrolling improve the kernel performance by a factor of 15.8 compared to a baseline kernel on the Nallatech 385A FPGA card that features an Intel Arria 10 GX 1150 FPGA. In terms of energy efficiency, the performance per watt on the FPGA platform is 2.8X higher than that on an Intel Xeon 16-core CPU, and 1.7X higher than that on an Nvidia Tesla K80 GPU. On the other hand, the performance per watt on an Intel Xeon Phi Knights Landing CPU and an Nvidia Tesla P100 GPU are 5.3X and 1.7X higher than that on the FPGA, respectively.