Accelerating Non-Negative Matrix Factorization on Embedded FPGA with Hybrid Logarithmic Dot-Product Approximation

Yizhi Chen, Yarib Nevarez, Zhonghai Lu, A. García-Ortiz
{"title":"Accelerating Non-Negative Matrix Factorization on Embedded FPGA with Hybrid Logarithmic Dot-Product Approximation","authors":"Yizhi Chen, Yarib Nevarez, Zhonghai Lu, A. García-Ortiz","doi":"10.1109/MCSoC57363.2022.00070","DOIUrl":null,"url":null,"abstract":"Non-negative matrix factorization (NMF) is an ef-fective method for dimensionality reduction and sparse decom-position. This method has been of great interest to the scien-tific community in applications including signal processing, data mining, compression, and pattern recognition. However, NMF implies elevated computational costs in terms of performance and energy consumption, which is inadequate for embedded applications. To overcome this limitation, we implement the vector dot-product with hybrid logarithmic approximation as a hardware optimization approach. This technique accelerates floating-point computation, reduces energy consumption, and preserves accuracy. To demonstrate our approach, we employ a design exploration flow using high-level synthesis on an embedded FPGA. Compared with software solutions on ARM CPU, this hardware implementation accelerates the overall computation to decompose matrix by $5.597\\times$ and reduces energy consumption by $69.323\\times$. Log approximation NMF combined with KNN(k-nearest neighbors) has only 2.38% decreasing accuracy compared with the result of KNN processing the matrix after floating-point NMF on MNIST. Further on, compared with a dedicated floating-point accelerator, the logarithmic approximation approach achieves $3.718\\times$ acceleration and $8.345\\times$ energy reduction. Compared with the fixed-point approach, our approach has an accuracy degradation of 1.93% on MNIST and an accuracy amelioration of 28.2% on the FASHION MNIST data set without pre-knowledge of the data range. Thus, our approach has better compatibility with the input data range.","PeriodicalId":150801,"journal":{"name":"2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MCSoC57363.2022.00070","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Non-negative matrix factorization (NMF) is an ef-fective method for dimensionality reduction and sparse decom-position. This method has been of great interest to the scien-tific community in applications including signal processing, data mining, compression, and pattern recognition. However, NMF implies elevated computational costs in terms of performance and energy consumption, which is inadequate for embedded applications. To overcome this limitation, we implement the vector dot-product with hybrid logarithmic approximation as a hardware optimization approach. This technique accelerates floating-point computation, reduces energy consumption, and preserves accuracy. To demonstrate our approach, we employ a design exploration flow using high-level synthesis on an embedded FPGA. Compared with software solutions on ARM CPU, this hardware implementation accelerates the overall computation to decompose matrix by $5.597\times$ and reduces energy consumption by $69.323\times$. Log approximation NMF combined with KNN(k-nearest neighbors) has only 2.38% decreasing accuracy compared with the result of KNN processing the matrix after floating-point NMF on MNIST. Further on, compared with a dedicated floating-point accelerator, the logarithmic approximation approach achieves $3.718\times$ acceleration and $8.345\times$ energy reduction. Compared with the fixed-point approach, our approach has an accuracy degradation of 1.93% on MNIST and an accuracy amelioration of 28.2% on the FASHION MNIST data set without pre-knowledge of the data range. Thus, our approach has better compatibility with the input data range.
基于混合对数点积逼近的嵌入式FPGA加速非负矩阵分解
非负矩阵分解(NMF)是一种有效的降维和稀疏分解方法。该方法在信号处理、数据挖掘、压缩和模式识别等应用领域引起了科学界的极大兴趣。然而,NMF意味着在性能和能耗方面的计算成本增加,这对于嵌入式应用来说是不够的。为了克服这一限制,我们实现了混合对数近似的向量点积作为硬件优化方法。该技术加速了浮点计算,降低了能耗,并保持了精度。为了演示我们的方法,我们在嵌入式FPGA上使用高级合成的设计探索流程。与ARM CPU上的软件解决方案相比,该硬件实现使分解矩阵的整体计算速度提高了5.597\times$,能耗降低了69.323\times$。结合KNN(k近邻)的对数近似NMF与KNN在MNIST上对矩阵进行浮点NMF处理后的结果相比,准确率仅下降了2.38%。进一步说,与专用浮点加速器相比,对数近似方法实现了3.718倍的加速和8.345倍的能量减少。与定点方法相比,我们的方法在不预先知道数据范围的情况下,在MNIST数据集上的精度降低了1.93%,在FASHION MNIST数据集上的精度提高了28.2%。因此,我们的方法与输入数据范围具有更好的兼容性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信