{"title":"fpga上优化OpenCL应用的性能分析框架","authors":"Ze-ke Wang, Bingsheng He, Wei Zhang, Shunning Jiang","doi":"10.1109/HPCA.2016.7446058","DOIUrl":null,"url":null,"abstract":"Recently, FPGA vendors such as Altera and Xilinx have released OpenCL SDK for programming FPGAs. However, the architecture of FPGA is significantly different from that of CPU/GPU, for which OpenCL is originally designed. Tuning the OpenCL code for good performance on FPGAs is still an open problem, since the existing OpenCL tools and models designed for CPUs/GPUs are not directly applicable to FPGAs. In the paper, we present an FPGA-based performance analysis framework that can shed light on the performance bottleneck and thus guide the code tuning for OpenCL applications on FPGAs. Particularly, we leverage static and dynamic analysis to develop an analytical performance model, which has captured the key architectural features of FPGA abstractions under OpenCL. Then, we provide four programmer-interpretable metrics to quantify the performance potentials of the OpenCL program with input optimization combination for the next optimization step. We evaluate our framework with a number of user cases, and demonstrate that 1) our analytical performance model can accurately predict the performance of OpenCL programs with different optimization combinations on FPGAs, and 2) our tool can be used to effectively guide the code tuning on alleviating the performance bottleneck.","PeriodicalId":417994,"journal":{"name":"2016 IEEE International Symposium on High Performance Computer Architecture (HPCA)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"82","resultStr":"{\"title\":\"A performance analysis framework for optimizing OpenCL applications on FPGAs\",\"authors\":\"Ze-ke Wang, Bingsheng He, Wei Zhang, Shunning Jiang\",\"doi\":\"10.1109/HPCA.2016.7446058\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recently, FPGA vendors such as Altera and Xilinx have released OpenCL SDK for programming FPGAs. However, the architecture of FPGA is significantly different from that of CPU/GPU, for which OpenCL is originally designed. Tuning the OpenCL code for good performance on FPGAs is still an open problem, since the existing OpenCL tools and models designed for CPUs/GPUs are not directly applicable to FPGAs. In the paper, we present an FPGA-based performance analysis framework that can shed light on the performance bottleneck and thus guide the code tuning for OpenCL applications on FPGAs. Particularly, we leverage static and dynamic analysis to develop an analytical performance model, which has captured the key architectural features of FPGA abstractions under OpenCL. Then, we provide four programmer-interpretable metrics to quantify the performance potentials of the OpenCL program with input optimization combination for the next optimization step. We evaluate our framework with a number of user cases, and demonstrate that 1) our analytical performance model can accurately predict the performance of OpenCL programs with different optimization combinations on FPGAs, and 2) our tool can be used to effectively guide the code tuning on alleviating the performance bottleneck.\",\"PeriodicalId\":417994,\"journal\":{\"name\":\"2016 IEEE International Symposium on High Performance Computer Architecture (HPCA)\",\"volume\":\"18 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-03-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"82\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 IEEE International Symposium on High Performance Computer Architecture (HPCA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HPCA.2016.7446058\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE International Symposium on High Performance Computer Architecture (HPCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCA.2016.7446058","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A performance analysis framework for optimizing OpenCL applications on FPGAs
Recently, FPGA vendors such as Altera and Xilinx have released OpenCL SDK for programming FPGAs. However, the architecture of FPGA is significantly different from that of CPU/GPU, for which OpenCL is originally designed. Tuning the OpenCL code for good performance on FPGAs is still an open problem, since the existing OpenCL tools and models designed for CPUs/GPUs are not directly applicable to FPGAs. In the paper, we present an FPGA-based performance analysis framework that can shed light on the performance bottleneck and thus guide the code tuning for OpenCL applications on FPGAs. Particularly, we leverage static and dynamic analysis to develop an analytical performance model, which has captured the key architectural features of FPGA abstractions under OpenCL. Then, we provide four programmer-interpretable metrics to quantify the performance potentials of the OpenCL program with input optimization combination for the next optimization step. We evaluate our framework with a number of user cases, and demonstrate that 1) our analytical performance model can accurately predict the performance of OpenCL programs with different optimization combinations on FPGAs, and 2) our tool can be used to effectively guide the code tuning on alleviating the performance bottleneck.