{"title":"On the Importance of Improving Cache Locality in Application-specific Accelerators via HLS","authors":"Yasin Alptekin, Ismail San","doi":"10.1109/SIU49456.2020.9302114","DOIUrl":null,"url":null,"abstract":"Designing an hardware architectiire using a lowlevel hardware description language (Verilog, VHDL) is a difficult and time-consuming task especially when the application is a complex and memory intensive one. A high-level synthesis (HLS) tool, most recently and actively being researched in several research groups, automatically generates an RTL description of the hardware architecture from a high-level (C/C++) program. However, application to be accelerated on the hardware via an HLS tool needs to be written in order to decrease the overall memory access latency by simply rewriting the code so that the reformatted loop structure will have more locality. In this paper, we present two case studies to decrease the memory access latency by improving the locality of a given application by reorganizing the memory access pattern of the application being accelerated via HLS on hardware that has a cache. We also emphasize the importance of locality in performance of hardware accelerators with our empirical results on a Zynq-based SoC platform.","PeriodicalId":312627,"journal":{"name":"2020 28th Signal Processing and Communications Applications Conference (SIU)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 28th Signal Processing and Communications Applications Conference (SIU)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SIU49456.2020.9302114","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Designing an hardware architectiire using a lowlevel hardware description language (Verilog, VHDL) is a difficult and time-consuming task especially when the application is a complex and memory intensive one. A high-level synthesis (HLS) tool, most recently and actively being researched in several research groups, automatically generates an RTL description of the hardware architecture from a high-level (C/C++) program. However, application to be accelerated on the hardware via an HLS tool needs to be written in order to decrease the overall memory access latency by simply rewriting the code so that the reformatted loop structure will have more locality. In this paper, we present two case studies to decrease the memory access latency by improving the locality of a given application by reorganizing the memory access pattern of the application being accelerated via HLS on hardware that has a cache. We also emphasize the importance of locality in performance of hardware accelerators with our empirical results on a Zynq-based SoC platform.