TputCache: High-frequency, multi-way cache for high-throughput FPGA applications

Aaron Severance, G. Lemieux
{"title":"TputCache: High-frequency, multi-way cache for high-throughput FPGA applications","authors":"Aaron Severance, G. Lemieux","doi":"10.1109/FPL.2013.6645537","DOIUrl":null,"url":null,"abstract":"Throughput processing involves using many different contexts or threads to solve multiple problems or subproblems in parallel, where the size of the problem is large enough that latency can be tolerated. Bandwidth is required to support multiple concurrent executions, however, and utilizing multiple external memory channels is costly. For small working sets, FPGA designers can use on-chip BRAMs achieve the necessary bandwidth without increasing the system cost. Designing algorithms around fixed-size local memories is difficult, however, as there is no graceful fallback if the problem size exceeds the amount of local memory. This paper introduces TputCache, a cache designed to meet the needs of throughput processing on FPGAs, giving the throughput performance of on-chip BRAMs when the problem size fits in local memory. The design utilizes a replay based architecture to achieve high frequency with very low resource overheads.","PeriodicalId":200435,"journal":{"name":"2013 23rd International Conference on Field programmable Logic and Applications","volume":"7 3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 23rd International Conference on Field programmable Logic and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FPL.2013.6645537","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

Abstract

Throughput processing involves using many different contexts or threads to solve multiple problems or subproblems in parallel, where the size of the problem is large enough that latency can be tolerated. Bandwidth is required to support multiple concurrent executions, however, and utilizing multiple external memory channels is costly. For small working sets, FPGA designers can use on-chip BRAMs achieve the necessary bandwidth without increasing the system cost. Designing algorithms around fixed-size local memories is difficult, however, as there is no graceful fallback if the problem size exceeds the amount of local memory. This paper introduces TputCache, a cache designed to meet the needs of throughput processing on FPGAs, giving the throughput performance of on-chip BRAMs when the problem size fits in local memory. The design utilizes a replay based architecture to achieve high frequency with very low resource overheads.
TputCache:用于高吞吐量FPGA应用的高频多路缓存
吞吐量处理涉及使用许多不同的上下文或线程并行地解决多个问题或子问题,其中问题的大小足够大,可以容忍延迟。但是,需要带宽来支持多个并发执行,并且利用多个外部内存通道的成本很高。对于小型工作集,FPGA设计人员可以使用片上bram实现必要的带宽,而不会增加系统成本。然而,围绕固定大小的本地内存设计算法是困难的,因为如果问题大小超过了本地内存的数量,就没有合适的回退。本文介绍了一种满足fpga吞吐量处理需求的缓存TputCache,给出了当问题大小适合本地存储器时片上bram的吞吐量性能。该设计利用基于重放的架构以非常低的资源开销实现高频率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信