Using Machine Learning to Estimate Utilization and Throughput for OpenCL-Based Matrix-Vector Multiplication (MVM)

2020 10th Annual Computing and Communication Workshop and Conference (CCWC) Pub Date : 2020-01-01 DOI:10.1109/CCWC47524.2020.9031173

Jannatun Naher, C. Gloster, C. Doss, Shrikant S. Jadhav

{"title":"Using Machine Learning to Estimate Utilization and Throughput for OpenCL-Based Matrix-Vector Multiplication (MVM)","authors":"Jannatun Naher, C. Gloster, C. Doss, Shrikant S. Jadhav","doi":"10.1109/CCWC47524.2020.9031173","DOIUrl":null,"url":null,"abstract":"OpenCL is a framework for writing programs that execute across heterogeneous platforms, including FPGAs. OpenCL allows users to write standardized C-like code for the host as well as for the hardware accelerators, thus reducing the programming challenge for FPGAs. Hardware descriptions can be written in OpenCL using different memory access and data partitioning strategies. Matrix-Vector Multiplication (MVM) is the critical computational bottleneck for many System of Linear Equations (SLEs) solvers. The MVM OpenCL kernel can be optimized by varying several design parameters in the OpenCL description, improving hardware performance. To effectively explore the design space, logic synthesis is performed after each iteration of setting design parameters to determine their impact on design area and performance. However, each of these synthesis runs can take multiple hours. Hence, manual design space exploration for a large number of designs is prohibitive. To address this challenge, a prediction of FPGA utilization and throughput can significantly reduce the design time. This paper presents a machine learning-based approach to estimating FPGA utilization and throughput for a given set of design parameter values. It also presents an optimized MVM implementation obtained after compiling, synthesizing, and executing over 100 designs. The Random Forest machine learning algorithm estimates the result and for 175 designs, the average error is. 0098%,. 0012%,. 0039%,. 0414%, and 123.21% for estimating Look-up Tables (LUTs), Digital Signal Processors (DSPs), memory bits, RAM blocks and throughput (GFLOPs) respectively.","PeriodicalId":161209,"journal":{"name":"2020 10th Annual Computing and Communication Workshop and Conference (CCWC)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 10th Annual Computing and Communication Workshop and Conference (CCWC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCWC47524.2020.9031173","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

OpenCL is a framework for writing programs that execute across heterogeneous platforms, including FPGAs. OpenCL allows users to write standardized C-like code for the host as well as for the hardware accelerators, thus reducing the programming challenge for FPGAs. Hardware descriptions can be written in OpenCL using different memory access and data partitioning strategies. Matrix-Vector Multiplication (MVM) is the critical computational bottleneck for many System of Linear Equations (SLEs) solvers. The MVM OpenCL kernel can be optimized by varying several design parameters in the OpenCL description, improving hardware performance. To effectively explore the design space, logic synthesis is performed after each iteration of setting design parameters to determine their impact on design area and performance. However, each of these synthesis runs can take multiple hours. Hence, manual design space exploration for a large number of designs is prohibitive. To address this challenge, a prediction of FPGA utilization and throughput can significantly reduce the design time. This paper presents a machine learning-based approach to estimating FPGA utilization and throughput for a given set of design parameter values. It also presents an optimized MVM implementation obtained after compiling, synthesizing, and executing over 100 designs. The Random Forest machine learning algorithm estimates the result and for 175 designs, the average error is. 0098%,. 0012%,. 0039%,. 0414%, and 123.21% for estimating Look-up Tables (LUTs), Digital Signal Processors (DSPs), memory bits, RAM blocks and throughput (GFLOPs) respectively.

查看原文本刊更多论文

利用机器学习估计基于opencl的矩阵向量乘法(MVM)的利用率和吞吐量

OpenCL是用于编写跨异构平台(包括fpga)执行的程序的框架。OpenCL允许用户为主机和硬件加速器编写标准化的类c代码，从而减少fpga的编程挑战。硬件描述可以使用不同的内存访问和数据分区策略在OpenCL中编写。矩阵向量乘法(MVM)是许多线性方程组(SLEs)求解的关键计算瓶颈。MVM OpenCL内核可以通过改变OpenCL描述中的几个设计参数来优化，从而提高硬件性能。为了有效地探索设计空间，在每次迭代设置设计参数后进行逻辑综合，以确定其对设计区域和性能的影响。然而，这些合成的每一次运行都要花费数小时。因此，对于大量的设计，手工的设计空间探索是令人望而却步的。为了应对这一挑战，对FPGA利用率和吞吐量的预测可以显著缩短设计时间。本文提出了一种基于机器学习的方法来估计给定一组设计参数值的FPGA利用率和吞吐量。在编译、综合和执行了100多个设计后，给出了一个优化的MVM实现。随机森林机器学习算法估计结果，对于175个设计，平均误差为。0098%,。0012%,。0039%,。0414%和123.21%分别用于估计查找表(LUTs)、数字信号处理器(dsp)、内存位、RAM块和吞吐量(GFLOPs)。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2020 10th Annual Computing and Communication Workshop and Conference (CCWC)

自引率

0.00%

发文量