通过HLS提高特定应用加速器缓存局部性的重要性

2020 28th Signal Processing and Communications Applications Conference (SIU) Pub Date : 2020-10-05 DOI:10.1109/SIU49456.2020.9302114

Yasin Alptekin, Ismail San

{"title":"通过HLS提高特定应用加速器缓存局部性的重要性","authors":"Yasin Alptekin, Ismail San","doi":"10.1109/SIU49456.2020.9302114","DOIUrl":null,"url":null,"abstract":"Designing an hardware architectiire using a lowlevel hardware description language (Verilog, VHDL) is a difficult and time-consuming task especially when the application is a complex and memory intensive one. A high-level synthesis (HLS) tool, most recently and actively being researched in several research groups, automatically generates an RTL description of the hardware architecture from a high-level (C/C++) program. However, application to be accelerated on the hardware via an HLS tool needs to be written in order to decrease the overall memory access latency by simply rewriting the code so that the reformatted loop structure will have more locality. In this paper, we present two case studies to decrease the memory access latency by improving the locality of a given application by reorganizing the memory access pattern of the application being accelerated via HLS on hardware that has a cache. We also emphasize the importance of locality in performance of hardware accelerators with our empirical results on a Zynq-based SoC platform.","PeriodicalId":312627,"journal":{"name":"2020 28th Signal Processing and Communications Applications Conference (SIU)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"On the Importance of Improving Cache Locality in Application-specific Accelerators via HLS\",\"authors\":\"Yasin Alptekin, Ismail San\",\"doi\":\"10.1109/SIU49456.2020.9302114\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Designing an hardware architectiire using a lowlevel hardware description language (Verilog, VHDL) is a difficult and time-consuming task especially when the application is a complex and memory intensive one. A high-level synthesis (HLS) tool, most recently and actively being researched in several research groups, automatically generates an RTL description of the hardware architecture from a high-level (C/C++) program. However, application to be accelerated on the hardware via an HLS tool needs to be written in order to decrease the overall memory access latency by simply rewriting the code so that the reformatted loop structure will have more locality. In this paper, we present two case studies to decrease the memory access latency by improving the locality of a given application by reorganizing the memory access pattern of the application being accelerated via HLS on hardware that has a cache. We also emphasize the importance of locality in performance of hardware accelerators with our empirical results on a Zynq-based SoC platform.\",\"PeriodicalId\":312627,\"journal\":{\"name\":\"2020 28th Signal Processing and Communications Applications Conference (SIU)\",\"volume\":\"19 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-10-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 28th Signal Processing and Communications Applications Conference (SIU)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SIU49456.2020.9302114\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 28th Signal Processing and Communications Applications Conference (SIU)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SIU49456.2020.9302114","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

使用低级硬件描述语言(Verilog, VHDL)设计硬件架构是一项困难且耗时的任务，特别是当应用程序是复杂且内存密集型的应用程序时。一个高级综合(HLS)工具，最近在几个研究小组中得到了积极的研究，它可以从高级(C/ c++)程序自动生成硬件体系结构的RTL描述。但是，需要编写通过HLS工具在硬件上加速的应用程序，以便通过简单地重写代码来减少总体内存访问延迟，从而使重新格式化的循环结构具有更多的局部性。在本文中，我们给出了两个案例研究，通过在具有缓存的硬件上通过HLS加速的应用程序的内存访问模式重新组织，从而改善给定应用程序的局域性，从而减少内存访问延迟。我们还通过基于zynq的SoC平台上的经验结果强调了局部性对硬件加速器性能的重要性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

On the Importance of Improving Cache Locality in Application-specific Accelerators via HLS

Designing an hardware architectiire using a lowlevel hardware description language (Verilog, VHDL) is a difficult and time-consuming task especially when the application is a complex and memory intensive one. A high-level synthesis (HLS) tool, most recently and actively being researched in several research groups, automatically generates an RTL description of the hardware architecture from a high-level (C/C++) program. However, application to be accelerated on the hardware via an HLS tool needs to be written in order to decrease the overall memory access latency by simply rewriting the code so that the reformatted loop structure will have more locality. In this paper, we present two case studies to decrease the memory access latency by improving the locality of a given application by reorganizing the memory access pattern of the application being accelerated via HLS on hardware that has a cache. We also emphasize the importance of locality in performance of hardware accelerators with our empirical results on a Zynq-based SoC platform.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 28th Signal Processing and Communications Applications Conference (SIU)

自引率

0.00%

发文量