通过HLS提高特定应用加速器缓存局部性的重要性

Yasin Alptekin, Ismail San
{"title":"通过HLS提高特定应用加速器缓存局部性的重要性","authors":"Yasin Alptekin, Ismail San","doi":"10.1109/SIU49456.2020.9302114","DOIUrl":null,"url":null,"abstract":"Designing an hardware architectiire using a lowlevel hardware description language (Verilog, VHDL) is a difficult and time-consuming task especially when the application is a complex and memory intensive one. A high-level synthesis (HLS) tool, most recently and actively being researched in several research groups, automatically generates an RTL description of the hardware architecture from a high-level (C/C++) program. However, application to be accelerated on the hardware via an HLS tool needs to be written in order to decrease the overall memory access latency by simply rewriting the code so that the reformatted loop structure will have more locality. In this paper, we present two case studies to decrease the memory access latency by improving the locality of a given application by reorganizing the memory access pattern of the application being accelerated via HLS on hardware that has a cache. We also emphasize the importance of locality in performance of hardware accelerators with our empirical results on a Zynq-based SoC platform.","PeriodicalId":312627,"journal":{"name":"2020 28th Signal Processing and Communications Applications Conference (SIU)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"On the Importance of Improving Cache Locality in Application-specific Accelerators via HLS\",\"authors\":\"Yasin Alptekin, Ismail San\",\"doi\":\"10.1109/SIU49456.2020.9302114\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Designing an hardware architectiire using a lowlevel hardware description language (Verilog, VHDL) is a difficult and time-consuming task especially when the application is a complex and memory intensive one. A high-level synthesis (HLS) tool, most recently and actively being researched in several research groups, automatically generates an RTL description of the hardware architecture from a high-level (C/C++) program. However, application to be accelerated on the hardware via an HLS tool needs to be written in order to decrease the overall memory access latency by simply rewriting the code so that the reformatted loop structure will have more locality. In this paper, we present two case studies to decrease the memory access latency by improving the locality of a given application by reorganizing the memory access pattern of the application being accelerated via HLS on hardware that has a cache. We also emphasize the importance of locality in performance of hardware accelerators with our empirical results on a Zynq-based SoC platform.\",\"PeriodicalId\":312627,\"journal\":{\"name\":\"2020 28th Signal Processing and Communications Applications Conference (SIU)\",\"volume\":\"19 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-10-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 28th Signal Processing and Communications Applications Conference (SIU)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SIU49456.2020.9302114\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 28th Signal Processing and Communications Applications Conference (SIU)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SIU49456.2020.9302114","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

使用低级硬件描述语言(Verilog, VHDL)设计硬件架构是一项困难且耗时的任务,特别是当应用程序是复杂且内存密集型的应用程序时。一个高级综合(HLS)工具,最近在几个研究小组中得到了积极的研究,它可以从高级(C/ c++)程序自动生成硬件体系结构的RTL描述。但是,需要编写通过HLS工具在硬件上加速的应用程序,以便通过简单地重写代码来减少总体内存访问延迟,从而使重新格式化的循环结构具有更多的局部性。在本文中,我们给出了两个案例研究,通过在具有缓存的硬件上通过HLS加速的应用程序的内存访问模式重新组织,从而改善给定应用程序的局域性,从而减少内存访问延迟。我们还通过基于zynq的SoC平台上的经验结果强调了局部性对硬件加速器性能的重要性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
On the Importance of Improving Cache Locality in Application-specific Accelerators via HLS
Designing an hardware architectiire using a lowlevel hardware description language (Verilog, VHDL) is a difficult and time-consuming task especially when the application is a complex and memory intensive one. A high-level synthesis (HLS) tool, most recently and actively being researched in several research groups, automatically generates an RTL description of the hardware architecture from a high-level (C/C++) program. However, application to be accelerated on the hardware via an HLS tool needs to be written in order to decrease the overall memory access latency by simply rewriting the code so that the reformatted loop structure will have more locality. In this paper, we present two case studies to decrease the memory access latency by improving the locality of a given application by reorganizing the memory access pattern of the application being accelerated via HLS on hardware that has a cache. We also emphasize the importance of locality in performance of hardware accelerators with our empirical results on a Zynq-based SoC platform.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信