HCMA: Supporting High Concurrency of Memory Accesses with Scratchpad Memory in FPGAs

Yangyang Zhao, Yuhang Liu, Wei Li, Mingyu Chen
{"title":"HCMA: Supporting High Concurrency of Memory Accesses with Scratchpad Memory in FPGAs","authors":"Yangyang Zhao, Yuhang Liu, Wei Li, Mingyu Chen","doi":"10.1109/NAS.2019.8834726","DOIUrl":null,"url":null,"abstract":"Currently many researches focus on new methods of accelerating memory accesses between memory controller and memory modules. However, the absence of an accelerator for memory accesses between CPU and memory controller wastes the performance benefits of new methods. Therefore, we propose a coordinated batch method to support high concurrency of memory accesses (HCMA). Compared to the conventional method of holding outstanding memory access requests in miss status handling registers (MSHRs), HCMA method takes advantage of scratchpad memory in FPGAs or SoCs to circumvent the limitation of MSHR entries. The concurrency of requests is only limited by the capacity of scratchpad memory. Moreover, to avoid the higher latency when searching more entries, we design an efficient coordinating mechanism based on circular queues.We evaluate the performance of HCMA method on an MP-SoC FPGA platform. Compared to conventional methods based on MSHRs, HCMA method supports ten times of concurrent memory accesses (from 10 to 128 entries on our evaluation platform). HCMA method achieves up to 2.72× memory bandwidth utilization for applications that access memory with massive fine-grained random requests, and to 3.46× memory bandwidth utilization for stream-based memory accesses. For real applications like CG, our method improves speedup performance by 29.87%.","PeriodicalId":230796,"journal":{"name":"2019 IEEE International Conference on Networking, Architecture and Storage (NAS)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE International Conference on Networking, Architecture and Storage (NAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NAS.2019.8834726","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Currently many researches focus on new methods of accelerating memory accesses between memory controller and memory modules. However, the absence of an accelerator for memory accesses between CPU and memory controller wastes the performance benefits of new methods. Therefore, we propose a coordinated batch method to support high concurrency of memory accesses (HCMA). Compared to the conventional method of holding outstanding memory access requests in miss status handling registers (MSHRs), HCMA method takes advantage of scratchpad memory in FPGAs or SoCs to circumvent the limitation of MSHR entries. The concurrency of requests is only limited by the capacity of scratchpad memory. Moreover, to avoid the higher latency when searching more entries, we design an efficient coordinating mechanism based on circular queues.We evaluate the performance of HCMA method on an MP-SoC FPGA platform. Compared to conventional methods based on MSHRs, HCMA method supports ten times of concurrent memory accesses (from 10 to 128 entries on our evaluation platform). HCMA method achieves up to 2.72× memory bandwidth utilization for applications that access memory with massive fine-grained random requests, and to 3.46× memory bandwidth utilization for stream-based memory accesses. For real applications like CG, our method improves speedup performance by 29.87%.
HCMA: fpga中使用刮板存储器支持高并发内存访问
目前许多研究都集中在加速存储器控制器和存储器模块之间存储器访问的新方法上。但是,由于CPU和内存控制器之间没有内存访问加速器,因此浪费了新方法的性能优势。因此,我们提出了一种协调批处理方法来支持内存访问的高并发性。与在miss状态处理寄存器(MSHR)中保存未完成的内存访问请求的传统方法相比,HCMA方法利用fpga或soc中的刮板存储器来规避MSHR条目的限制。请求的并发性仅受临时存储器容量的限制。此外,为了避免在搜索更多条目时产生更高的延迟,我们设计了一种基于循环队列的高效协调机制。我们在MP-SoC FPGA平台上评估了HCMA方法的性能。与基于MSHRs的传统方法相比,HCMA方法支持10倍的并发内存访问(在我们的评估平台上从10到128个条目)。对于具有大量细粒度随机请求访问内存的应用程序,HCMA方法的内存带宽利用率最高可达2.72倍,对于基于流的内存访问,HCMA方法的内存带宽利用率最高可达3.46倍。对于像CG这样的实际应用,我们的方法将加速性能提高了29.87%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信