Analytical and measured sustained bandwidth for an FPGA-based processor

G. R. Morris, A. Silas, K. Abed
{"title":"Analytical and measured sustained bandwidth for an FPGA-based processor","authors":"G. R. Morris, A. Silas, K. Abed","doi":"10.1109/SECON.2012.6196914","DOIUrl":null,"url":null,"abstract":"Previous research has shown that floating-point kernels mapped onto field programmable gate array (FPGA)-based high performance reconfigurable computers (HPRCs) must satisfy a variety of heuristics and rules of thumb to achieve a speedup compared with their software counterparts. One such rule of thumb is that applications with large or irregular stride memory access, e.g., sparse matrix kernels, can run significantly faster on HPRCs. This paper, by way of a simple sparse matrix Jacobi iterative solver, demonstrates why this speedup can occur. Using a well-known off-the-shelf sustained bandwidth measurement tool and a port of that tool onto an FPGA-based computer, this paper reveals that, unlike general purpose processors, FPGA-based processors do not suffer from significant bandwidth degradation at large data sizes as do cache-based general purpose processors. The paper then validates the observations by way of both experimentally measured runtimes and analytically derived runtimes for a simple sparse matrix Jacobi iterative solver. This research clearly validates that 1) unlike a cache-based general purpose processor, the FPGA bandwidth is constant across the entire range of considered sparse data sets, and 2) the experimentally determined runtimes for both the software and FPGA-based Jacobi kernel are in very close agreement.","PeriodicalId":187091,"journal":{"name":"2012 Proceedings of IEEE Southeastcon","volume":"44 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 Proceedings of IEEE Southeastcon","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SECON.2012.6196914","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

Previous research has shown that floating-point kernels mapped onto field programmable gate array (FPGA)-based high performance reconfigurable computers (HPRCs) must satisfy a variety of heuristics and rules of thumb to achieve a speedup compared with their software counterparts. One such rule of thumb is that applications with large or irregular stride memory access, e.g., sparse matrix kernels, can run significantly faster on HPRCs. This paper, by way of a simple sparse matrix Jacobi iterative solver, demonstrates why this speedup can occur. Using a well-known off-the-shelf sustained bandwidth measurement tool and a port of that tool onto an FPGA-based computer, this paper reveals that, unlike general purpose processors, FPGA-based processors do not suffer from significant bandwidth degradation at large data sizes as do cache-based general purpose processors. The paper then validates the observations by way of both experimentally measured runtimes and analytically derived runtimes for a simple sparse matrix Jacobi iterative solver. This research clearly validates that 1) unlike a cache-based general purpose processor, the FPGA bandwidth is constant across the entire range of considered sparse data sets, and 2) the experimentally determined runtimes for both the software and FPGA-based Jacobi kernel are in very close agreement.
分析和测量基于fpga的处理器的持续带宽
先前的研究表明,将浮点核映射到基于现场可编程门阵列(FPGA)的高性能可重构计算机(HPRCs)上,必须满足各种启发式和经验法则,才能实现与软件相比的加速。一个这样的经验法则是,具有大量或不规则跨步内存访问的应用程序,例如,稀疏矩阵内核,可以在hprc上运行得更快。本文通过一个简单的稀疏矩阵Jacobi迭代求解器,说明了为什么会出现这种加速。使用一种著名的现成的持续带宽测量工具,并将该工具移植到基于fpga的计算机上,本文揭示了,与通用处理器不同,基于fpga的处理器在大数据量时不会像基于缓存的通用处理器那样遭受明显的带宽退化。然后用实验测量的运行时间和解析推导的运行时间对一个简单的稀疏矩阵雅可比迭代求解器进行了验证。这项研究清楚地验证了1)与基于缓存的通用处理器不同,FPGA带宽在考虑的稀疏数据集的整个范围内是恒定的,2)实验确定的软件和基于FPGA的Jacobi内核的运行时间非常接近。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信