Loop Optimizations of MGS-QRD Algorithm for FPGA High-Level Synthesis

Chong Yeam Tan, C. Y. Ooi, N. Ismail
{"title":"Loop Optimizations of MGS-QRD Algorithm for FPGA High-Level Synthesis","authors":"Chong Yeam Tan, C. Y. Ooi, N. Ismail","doi":"10.1109/SOCC46988.2019.1570548480","DOIUrl":null,"url":null,"abstract":"The best-known Modified Gram-Schmidt QR decomposition (MGS-QRD) algorithm contains many dependency problems in the aspects of data, memory, loop and control that hinder the high-level synthesis from optimizing the algorithm. So, we present a well-formed algorithm structure to reduce latency and hardware resources. We also present the second MGS-QRD algorithm to further reduce the DSP usage and support bigger QR decomposition size. The proposed algorithms achieve better overall performance than the best-known MGSQRD algorithm. Mapped to an Intel Arria 10 FPGA device, we achieve 0.53us for an 8x8 real QRD of the first proposed algorithm, and 0.59us for an 8x8 real QRD of the second proposed algorithm in the implemented system latency. Various HLS optimization steps and dependence analysis are also provided to improve the performance, it shows an approximately 44 times increase in QRD throughput.","PeriodicalId":253998,"journal":{"name":"2019 32nd IEEE International System-on-Chip Conference (SOCC)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 32nd IEEE International System-on-Chip Conference (SOCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SOCC46988.2019.1570548480","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

The best-known Modified Gram-Schmidt QR decomposition (MGS-QRD) algorithm contains many dependency problems in the aspects of data, memory, loop and control that hinder the high-level synthesis from optimizing the algorithm. So, we present a well-formed algorithm structure to reduce latency and hardware resources. We also present the second MGS-QRD algorithm to further reduce the DSP usage and support bigger QR decomposition size. The proposed algorithms achieve better overall performance than the best-known MGSQRD algorithm. Mapped to an Intel Arria 10 FPGA device, we achieve 0.53us for an 8x8 real QRD of the first proposed algorithm, and 0.59us for an 8x8 real QRD of the second proposed algorithm in the implemented system latency. Various HLS optimization steps and dependence analysis are also provided to improve the performance, it shows an approximately 44 times increase in QRD throughput.
FPGA高级综合中MGS-QRD算法的环路优化
最著名的改进Gram-Schmidt QR分解(Modified Gram-Schmidt QR decomposition, MGS-QRD)算法在数据、内存、循环和控制等方面存在许多依赖问题,阻碍了高级合成对算法的优化。因此,我们提出了一种格式良好的算法结构,以减少延迟和硬件资源。我们还提出了第二种MGS-QRD算法,以进一步减少DSP的使用并支持更大的QR分解大小。本文提出的算法比最著名的MGSQRD算法具有更好的综合性能。在实现的系统延迟中,我们将第一种算法的8x8实QRD映射到Intel Arria 10 FPGA器件上,实现了0.53us,第二种算法的8x8实QRD实现了0.59us。还提供了各种HLS优化步骤和依赖性分析来提高性能,它显示QRD吞吐量增加了大约44倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信