{"title":"A High-Performance FPGA Accelerator for CUR Decomposition","authors":"M. Abdelgawad, R. Cheung, Hong Yan","doi":"10.1109/FPL57034.2022.00052","DOIUrl":null,"url":null,"abstract":"A matrix factorization is to decompose a matrix into a product of smaller matrices. It is widely used in machine learning algorithms. There are many matrix decomposition algorithms, and each has various applications. CUR matrix decomposition is a widely-used factorization tool that has been employed for dimension reduction and pattern recognition in many scientific and engineering applications, such as image processing, text mining, and wireless communications. In this paper we propose an efficient FPGA-based floating-point accelerator using high-level synthesis (HLS) for the CUR decomposition algorithm. Our experiment results demonstrate the better efficiency of our hardware design compared to the optimized CPU-based software solutions. The speedup of our FPGA-based architecture over the optimized software implementation ranges from 2.37 to 16.82 times for different dimensions of the data input matrix. We evaluated our design using large dimension matrices 1024 x 1024 and 2048 x 2048 and the experiment results demonstrated the efficiency of our design in terms of the utilized resources and latency. Finally, we have compared our design with other matrix decomposition algorithms such as SVD and QR decomposition, the experiment results demonstrated that CUR is more efficient than SVD and QR decomposition in terms of latency and required resources.","PeriodicalId":380116,"journal":{"name":"2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)","volume":"268 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FPL57034.2022.00052","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
A matrix factorization is to decompose a matrix into a product of smaller matrices. It is widely used in machine learning algorithms. There are many matrix decomposition algorithms, and each has various applications. CUR matrix decomposition is a widely-used factorization tool that has been employed for dimension reduction and pattern recognition in many scientific and engineering applications, such as image processing, text mining, and wireless communications. In this paper we propose an efficient FPGA-based floating-point accelerator using high-level synthesis (HLS) for the CUR decomposition algorithm. Our experiment results demonstrate the better efficiency of our hardware design compared to the optimized CPU-based software solutions. The speedup of our FPGA-based architecture over the optimized software implementation ranges from 2.37 to 16.82 times for different dimensions of the data input matrix. We evaluated our design using large dimension matrices 1024 x 1024 and 2048 x 2048 and the experiment results demonstrated the efficiency of our design in terms of the utilized resources and latency. Finally, we have compared our design with other matrix decomposition algorithms such as SVD and QR decomposition, the experiment results demonstrated that CUR is more efficient than SVD and QR decomposition in terms of latency and required resources.
矩阵分解就是把一个矩阵分解成若干个较小矩阵的乘积。它被广泛应用于机器学习算法。有许多矩阵分解算法,每种算法都有不同的应用。CUR矩阵分解是一种广泛使用的分解工具,在许多科学和工程应用中用于降维和模式识别,例如图像处理、文本挖掘和无线通信。在本文中,我们提出了一种高效的基于fpga的浮点加速器,使用高级合成(HLS)进行CUR分解算法。实验结果表明,与优化后的基于cpu的软件解决方案相比,我们的硬件设计具有更高的效率。对于不同维度的数据输入矩阵,我们基于fpga的架构比优化软件实现的加速范围从2.37到16.82倍。我们使用大维度矩阵1024 x 1024和2048 x 2048来评估我们的设计,实验结果证明了我们的设计在利用资源和延迟方面的效率。最后,我们将我们的设计与其他矩阵分解算法(如SVD和QR分解)进行了比较,实验结果表明,在延迟和所需资源方面,CUR比SVD和QR分解更有效。