Analytical and measured sustained bandwidth for an FPGA-based processor

2012 Proceedings of IEEE Southeastcon Pub Date : 2012-03-15 DOI:10.1109/SECON.2012.6196914

G. R. Morris, A. Silas, K. Abed

{"title":"Analytical and measured sustained bandwidth for an FPGA-based processor","authors":"G. R. Morris, A. Silas, K. Abed","doi":"10.1109/SECON.2012.6196914","DOIUrl":null,"url":null,"abstract":"Previous research has shown that floating-point kernels mapped onto field programmable gate array (FPGA)-based high performance reconfigurable computers (HPRCs) must satisfy a variety of heuristics and rules of thumb to achieve a speedup compared with their software counterparts. One such rule of thumb is that applications with large or irregular stride memory access, e.g., sparse matrix kernels, can run significantly faster on HPRCs. This paper, by way of a simple sparse matrix Jacobi iterative solver, demonstrates why this speedup can occur. Using a well-known off-the-shelf sustained bandwidth measurement tool and a port of that tool onto an FPGA-based computer, this paper reveals that, unlike general purpose processors, FPGA-based processors do not suffer from significant bandwidth degradation at large data sizes as do cache-based general purpose processors. The paper then validates the observations by way of both experimentally measured runtimes and analytically derived runtimes for a simple sparse matrix Jacobi iterative solver. This research clearly validates that 1) unlike a cache-based general purpose processor, the FPGA bandwidth is constant across the entire range of considered sparse data sets, and 2) the experimentally determined runtimes for both the software and FPGA-based Jacobi kernel are in very close agreement.","PeriodicalId":187091,"journal":{"name":"2012 Proceedings of IEEE Southeastcon","volume":"44 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 Proceedings of IEEE Southeastcon","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SECON.2012.6196914","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

Previous research has shown that floating-point kernels mapped onto field programmable gate array (FPGA)-based high performance reconfigurable computers (HPRCs) must satisfy a variety of heuristics and rules of thumb to achieve a speedup compared with their software counterparts. One such rule of thumb is that applications with large or irregular stride memory access, e.g., sparse matrix kernels, can run significantly faster on HPRCs. This paper, by way of a simple sparse matrix Jacobi iterative solver, demonstrates why this speedup can occur. Using a well-known off-the-shelf sustained bandwidth measurement tool and a port of that tool onto an FPGA-based computer, this paper reveals that, unlike general purpose processors, FPGA-based processors do not suffer from significant bandwidth degradation at large data sizes as do cache-based general purpose processors. The paper then validates the observations by way of both experimentally measured runtimes and analytically derived runtimes for a simple sparse matrix Jacobi iterative solver. This research clearly validates that 1) unlike a cache-based general purpose processor, the FPGA bandwidth is constant across the entire range of considered sparse data sets, and 2) the experimentally determined runtimes for both the software and FPGA-based Jacobi kernel are in very close agreement.

查看原文本刊更多论文

分析和测量基于fpga的处理器的持续带宽

先前的研究表明，将浮点核映射到基于现场可编程门阵列(FPGA)的高性能可重构计算机(HPRCs)上，必须满足各种启发式和经验法则，才能实现与软件相比的加速。一个这样的经验法则是，具有大量或不规则跨步内存访问的应用程序，例如，稀疏矩阵内核，可以在hprc上运行得更快。本文通过一个简单的稀疏矩阵Jacobi迭代求解器，说明了为什么会出现这种加速。使用一种著名的现成的持续带宽测量工具，并将该工具移植到基于fpga的计算机上，本文揭示了，与通用处理器不同，基于fpga的处理器在大数据量时不会像基于缓存的通用处理器那样遭受明显的带宽退化。然后用实验测量的运行时间和解析推导的运行时间对一个简单的稀疏矩阵雅可比迭代求解器进行了验证。这项研究清楚地验证了1)与基于缓存的通用处理器不同，FPGA带宽在考虑的稀疏数据集的整个范围内是恒定的，2)实验确定的软件和基于FPGA的Jacobi内核的运行时间非常接近。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2012 Proceedings of IEEE Southeastcon

自引率

0.00%

发文量