HCMA: Supporting High Concurrency of Memory Accesses with Scratchpad Memory in FPGAs

2019 IEEE International Conference on Networking, Architecture and Storage (NAS) Pub Date : 2019-08-01 DOI:10.1109/NAS.2019.8834726

Yangyang Zhao, Yuhang Liu, Wei Li, Mingyu Chen

{"title":"HCMA: Supporting High Concurrency of Memory Accesses with Scratchpad Memory in FPGAs","authors":"Yangyang Zhao, Yuhang Liu, Wei Li, Mingyu Chen","doi":"10.1109/NAS.2019.8834726","DOIUrl":null,"url":null,"abstract":"Currently many researches focus on new methods of accelerating memory accesses between memory controller and memory modules. However, the absence of an accelerator for memory accesses between CPU and memory controller wastes the performance benefits of new methods. Therefore, we propose a coordinated batch method to support high concurrency of memory accesses (HCMA). Compared to the conventional method of holding outstanding memory access requests in miss status handling registers (MSHRs), HCMA method takes advantage of scratchpad memory in FPGAs or SoCs to circumvent the limitation of MSHR entries. The concurrency of requests is only limited by the capacity of scratchpad memory. Moreover, to avoid the higher latency when searching more entries, we design an efficient coordinating mechanism based on circular queues.We evaluate the performance of HCMA method on an MP-SoC FPGA platform. Compared to conventional methods based on MSHRs, HCMA method supports ten times of concurrent memory accesses (from 10 to 128 entries on our evaluation platform). HCMA method achieves up to 2.72× memory bandwidth utilization for applications that access memory with massive fine-grained random requests, and to 3.46× memory bandwidth utilization for stream-based memory accesses. For real applications like CG, our method improves speedup performance by 29.87%.","PeriodicalId":230796,"journal":{"name":"2019 IEEE International Conference on Networking, Architecture and Storage (NAS)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE International Conference on Networking, Architecture and Storage (NAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NAS.2019.8834726","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Currently many researches focus on new methods of accelerating memory accesses between memory controller and memory modules. However, the absence of an accelerator for memory accesses between CPU and memory controller wastes the performance benefits of new methods. Therefore, we propose a coordinated batch method to support high concurrency of memory accesses (HCMA). Compared to the conventional method of holding outstanding memory access requests in miss status handling registers (MSHRs), HCMA method takes advantage of scratchpad memory in FPGAs or SoCs to circumvent the limitation of MSHR entries. The concurrency of requests is only limited by the capacity of scratchpad memory. Moreover, to avoid the higher latency when searching more entries, we design an efficient coordinating mechanism based on circular queues.We evaluate the performance of HCMA method on an MP-SoC FPGA platform. Compared to conventional methods based on MSHRs, HCMA method supports ten times of concurrent memory accesses (from 10 to 128 entries on our evaluation platform). HCMA method achieves up to 2.72× memory bandwidth utilization for applications that access memory with massive fine-grained random requests, and to 3.46× memory bandwidth utilization for stream-based memory accesses. For real applications like CG, our method improves speedup performance by 29.87%.

查看原文本刊更多论文

HCMA: fpga中使用刮板存储器支持高并发内存访问

目前许多研究都集中在加速存储器控制器和存储器模块之间存储器访问的新方法上。但是，由于CPU和内存控制器之间没有内存访问加速器，因此浪费了新方法的性能优势。因此，我们提出了一种协调批处理方法来支持内存访问的高并发性。与在miss状态处理寄存器(MSHR)中保存未完成的内存访问请求的传统方法相比，HCMA方法利用fpga或soc中的刮板存储器来规避MSHR条目的限制。请求的并发性仅受临时存储器容量的限制。此外，为了避免在搜索更多条目时产生更高的延迟，我们设计了一种基于循环队列的高效协调机制。我们在MP-SoC FPGA平台上评估了HCMA方法的性能。与基于MSHRs的传统方法相比，HCMA方法支持10倍的并发内存访问(在我们的评估平台上从10到128个条目)。对于具有大量细粒度随机请求访问内存的应用程序，HCMA方法的内存带宽利用率最高可达2.72倍，对于基于流的内存访问，HCMA方法的内存带宽利用率最高可达3.46倍。对于像CG这样的实际应用，我们的方法将加速性能提高了29.87%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 IEEE International Conference on Networking, Architecture and Storage (NAS)

自引率

0.00%

发文量