Accelerating Non-Negative Matrix Factorization on Embedded FPGA with Hybrid Logarithmic Dot-Product Approximation

2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC) Pub Date : 2022-12-01 DOI:10.1109/MCSoC57363.2022.00070

Yizhi Chen, Yarib Nevarez, Zhonghai Lu, A. García-Ortiz

{"title":"Accelerating Non-Negative Matrix Factorization on Embedded FPGA with Hybrid Logarithmic Dot-Product Approximation","authors":"Yizhi Chen, Yarib Nevarez, Zhonghai Lu, A. García-Ortiz","doi":"10.1109/MCSoC57363.2022.00070","DOIUrl":null,"url":null,"abstract":"Non-negative matrix factorization (NMF) is an ef-fective method for dimensionality reduction and sparse decom-position. This method has been of great interest to the scien-tific community in applications including signal processing, data mining, compression, and pattern recognition. However, NMF implies elevated computational costs in terms of performance and energy consumption, which is inadequate for embedded applications. To overcome this limitation, we implement the vector dot-product with hybrid logarithmic approximation as a hardware optimization approach. This technique accelerates floating-point computation, reduces energy consumption, and preserves accuracy. To demonstrate our approach, we employ a design exploration flow using high-level synthesis on an embedded FPGA. Compared with software solutions on ARM CPU, this hardware implementation accelerates the overall computation to decompose matrix by $5.597\\times$ and reduces energy consumption by $69.323\\times$. Log approximation NMF combined with KNN(k-nearest neighbors) has only 2.38% decreasing accuracy compared with the result of KNN processing the matrix after floating-point NMF on MNIST. Further on, compared with a dedicated floating-point accelerator, the logarithmic approximation approach achieves $3.718\\times$ acceleration and $8.345\\times$ energy reduction. Compared with the fixed-point approach, our approach has an accuracy degradation of 1.93% on MNIST and an accuracy amelioration of 28.2% on the FASHION MNIST data set without pre-knowledge of the data range. Thus, our approach has better compatibility with the input data range.","PeriodicalId":150801,"journal":{"name":"2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MCSoC57363.2022.00070","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Non-negative matrix factorization (NMF) is an ef-fective method for dimensionality reduction and sparse decom-position. This method has been of great interest to the scien-tific community in applications including signal processing, data mining, compression, and pattern recognition. However, NMF implies elevated computational costs in terms of performance and energy consumption, which is inadequate for embedded applications. To overcome this limitation, we implement the vector dot-product with hybrid logarithmic approximation as a hardware optimization approach. This technique accelerates floating-point computation, reduces energy consumption, and preserves accuracy. To demonstrate our approach, we employ a design exploration flow using high-level synthesis on an embedded FPGA. Compared with software solutions on ARM CPU, this hardware implementation accelerates the overall computation to decompose matrix by $5.597\times$ and reduces energy consumption by $69.323\times$. Log approximation NMF combined with KNN(k-nearest neighbors) has only 2.38% decreasing accuracy compared with the result of KNN processing the matrix after floating-point NMF on MNIST. Further on, compared with a dedicated floating-point accelerator, the logarithmic approximation approach achieves $3.718\times$ acceleration and $8.345\times$ energy reduction. Compared with the fixed-point approach, our approach has an accuracy degradation of 1.93% on MNIST and an accuracy amelioration of 28.2% on the FASHION MNIST data set without pre-knowledge of the data range. Thus, our approach has better compatibility with the input data range.

查看原文本刊更多论文

基于混合对数点积逼近的嵌入式FPGA加速非负矩阵分解

非负矩阵分解(NMF)是一种有效的降维和稀疏分解方法。该方法在信号处理、数据挖掘、压缩和模式识别等应用领域引起了科学界的极大兴趣。然而，NMF意味着在性能和能耗方面的计算成本增加，这对于嵌入式应用来说是不够的。为了克服这一限制，我们实现了混合对数近似的向量点积作为硬件优化方法。该技术加速了浮点计算，降低了能耗，并保持了精度。为了演示我们的方法，我们在嵌入式FPGA上使用高级合成的设计探索流程。与ARM CPU上的软件解决方案相比，该硬件实现使分解矩阵的整体计算速度提高了5.597\times$，能耗降低了69.323\times$。结合KNN(k近邻)的对数近似NMF与KNN在MNIST上对矩阵进行浮点NMF处理后的结果相比，准确率仅下降了2.38%。进一步说，与专用浮点加速器相比，对数近似方法实现了3.718倍的加速和8.345倍的能量减少。与定点方法相比，我们的方法在不预先知道数据范围的情况下，在MNIST数据集上的精度降低了1.93%，在FASHION MNIST数据集上的精度提高了28.2%。因此，我们的方法与输入数据范围具有更好的兼容性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)

自引率

0.00%

发文量