{"title":"基于OpenCL的fpga高性能高精度浮点运算","authors":"N. Nakasato, H. Daisaka, T. Ishikawa","doi":"10.1109/FPT.2018.00049","DOIUrl":null,"url":null,"abstract":"Development of high-level synthesis tools such as OpenCL SDK for FPGAs enables us to design accelerators for scientific applications that can take advantage of flexibility and efficiency of FPGAs. However, the available OpenCL SDKs only support the standard floating-point (FP) formats. In this paper, we present the performance evaluation of high precision FP operations, which are currently not supported in OpenCL, on recent FPGAs. By using a mechanism to call a custom design from an OpenCL kernel, we evaluate the performance of a sample application in high precision FP format binary128. We found that the sustained performance of our design in binary128 on Intel Arria10 and Stratix10 is 19 and 71 Gflops, respectively.","PeriodicalId":434541,"journal":{"name":"2018 International Conference on Field-Programmable Technology (FPT)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"High Performance High-Precision Floating-Point Operations on FPGAs Using OpenCL\",\"authors\":\"N. Nakasato, H. Daisaka, T. Ishikawa\",\"doi\":\"10.1109/FPT.2018.00049\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Development of high-level synthesis tools such as OpenCL SDK for FPGAs enables us to design accelerators for scientific applications that can take advantage of flexibility and efficiency of FPGAs. However, the available OpenCL SDKs only support the standard floating-point (FP) formats. In this paper, we present the performance evaluation of high precision FP operations, which are currently not supported in OpenCL, on recent FPGAs. By using a mechanism to call a custom design from an OpenCL kernel, we evaluate the performance of a sample application in high precision FP format binary128. We found that the sustained performance of our design in binary128 on Intel Arria10 and Stratix10 is 19 and 71 Gflops, respectively.\",\"PeriodicalId\":434541,\"journal\":{\"name\":\"2018 International Conference on Field-Programmable Technology (FPT)\",\"volume\":\"32 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 International Conference on Field-Programmable Technology (FPT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/FPT.2018.00049\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 International Conference on Field-Programmable Technology (FPT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FPT.2018.00049","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
High Performance High-Precision Floating-Point Operations on FPGAs Using OpenCL
Development of high-level synthesis tools such as OpenCL SDK for FPGAs enables us to design accelerators for scientific applications that can take advantage of flexibility and efficiency of FPGAs. However, the available OpenCL SDKs only support the standard floating-point (FP) formats. In this paper, we present the performance evaluation of high precision FP operations, which are currently not supported in OpenCL, on recent FPGAs. By using a mechanism to call a custom design from an OpenCL kernel, we evaluate the performance of a sample application in high precision FP format binary128. We found that the sustained performance of our design in binary128 on Intel Arria10 and Stratix10 is 19 and 71 Gflops, respectively.