Jannatun Naher, C. Gloster, C. Doss, Shrikant S. Jadhav
{"title":"Using Machine Learning to Estimate Utilization and Throughput for OpenCL-Based Matrix-Vector Multiplication (MVM)","authors":"Jannatun Naher, C. Gloster, C. Doss, Shrikant S. Jadhav","doi":"10.1109/CCWC47524.2020.9031173","DOIUrl":null,"url":null,"abstract":"OpenCL is a framework for writing programs that execute across heterogeneous platforms, including FPGAs. OpenCL allows users to write standardized C-like code for the host as well as for the hardware accelerators, thus reducing the programming challenge for FPGAs. Hardware descriptions can be written in OpenCL using different memory access and data partitioning strategies. Matrix-Vector Multiplication (MVM) is the critical computational bottleneck for many System of Linear Equations (SLEs) solvers. The MVM OpenCL kernel can be optimized by varying several design parameters in the OpenCL description, improving hardware performance. To effectively explore the design space, logic synthesis is performed after each iteration of setting design parameters to determine their impact on design area and performance. However, each of these synthesis runs can take multiple hours. Hence, manual design space exploration for a large number of designs is prohibitive. To address this challenge, a prediction of FPGA utilization and throughput can significantly reduce the design time. This paper presents a machine learning-based approach to estimating FPGA utilization and throughput for a given set of design parameter values. It also presents an optimized MVM implementation obtained after compiling, synthesizing, and executing over 100 designs. The Random Forest machine learning algorithm estimates the result and for 175 designs, the average error is. 0098%,. 0012%,. 0039%,. 0414%, and 123.21% for estimating Look-up Tables (LUTs), Digital Signal Processors (DSPs), memory bits, RAM blocks and throughput (GFLOPs) respectively.","PeriodicalId":161209,"journal":{"name":"2020 10th Annual Computing and Communication Workshop and Conference (CCWC)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 10th Annual Computing and Communication Workshop and Conference (CCWC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCWC47524.2020.9031173","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
OpenCL is a framework for writing programs that execute across heterogeneous platforms, including FPGAs. OpenCL allows users to write standardized C-like code for the host as well as for the hardware accelerators, thus reducing the programming challenge for FPGAs. Hardware descriptions can be written in OpenCL using different memory access and data partitioning strategies. Matrix-Vector Multiplication (MVM) is the critical computational bottleneck for many System of Linear Equations (SLEs) solvers. The MVM OpenCL kernel can be optimized by varying several design parameters in the OpenCL description, improving hardware performance. To effectively explore the design space, logic synthesis is performed after each iteration of setting design parameters to determine their impact on design area and performance. However, each of these synthesis runs can take multiple hours. Hence, manual design space exploration for a large number of designs is prohibitive. To address this challenge, a prediction of FPGA utilization and throughput can significantly reduce the design time. This paper presents a machine learning-based approach to estimating FPGA utilization and throughput for a given set of design parameter values. It also presents an optimized MVM implementation obtained after compiling, synthesizing, and executing over 100 designs. The Random Forest machine learning algorithm estimates the result and for 175 designs, the average error is. 0098%,. 0012%,. 0039%,. 0414%, and 123.21% for estimating Look-up Tables (LUTs), Digital Signal Processors (DSPs), memory bits, RAM blocks and throughput (GFLOPs) respectively.