Exploiting Memory-Level Parallelism in Reconfigurable Accelerators

Shaoyi Cheng, Mingjie Lin, H. Liu, S. Scott, J. Wawrzynek
{"title":"Exploiting Memory-Level Parallelism in Reconfigurable Accelerators","authors":"Shaoyi Cheng, Mingjie Lin, H. Liu, S. Scott, J. Wawrzynek","doi":"10.1109/FCCM.2012.35","DOIUrl":null,"url":null,"abstract":"As memory accesses increasingly limit the overall performance of reconfigurable accelerators, it is important for high level synthesis (HLS) flows to discover and exploit memory-level parallelism. This paper develops 1) a framework where parallelism between memory accesses can be revealed from runtime profile of applications and provided to a high level synthesis flow, and 2) a novel multi-accelerator/multi-cache architecture to support parallel memory accesses, taking advantage of the high aggregated memory bandwidth found in modern FPGA devices. Our experimental results have shown that for 10 accelerators generated from 9 benchmark applications, circuits using our proposed memory structure achieve on average 52% improved performance over accelerators using a traditional memory interface. We believe that our study represents a solid advance towards achieving memory-parallel embedded computing on hybrid CPU+FPGA platforms.","PeriodicalId":226197,"journal":{"name":"2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines","volume":"40 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FCCM.2012.35","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 13

Abstract

As memory accesses increasingly limit the overall performance of reconfigurable accelerators, it is important for high level synthesis (HLS) flows to discover and exploit memory-level parallelism. This paper develops 1) a framework where parallelism between memory accesses can be revealed from runtime profile of applications and provided to a high level synthesis flow, and 2) a novel multi-accelerator/multi-cache architecture to support parallel memory accesses, taking advantage of the high aggregated memory bandwidth found in modern FPGA devices. Our experimental results have shown that for 10 accelerators generated from 9 benchmark applications, circuits using our proposed memory structure achieve on average 52% improved performance over accelerators using a traditional memory interface. We believe that our study represents a solid advance towards achieving memory-parallel embedded computing on hybrid CPU+FPGA platforms.
在可重构加速器中利用内存级并行性
由于内存访问越来越多地限制了可重构加速器的整体性能,因此发现和利用内存级并行性对于高级综合流(HLS)非常重要。本文开发了1)一个框架,其中内存访问之间的并行性可以从应用程序的运行时配置文件中显示出来,并提供给高层次的合成流;2)一个新的多加速器/多缓存架构来支持并行内存访问,利用现代FPGA设备中发现的高聚合内存带宽。我们的实验结果表明,对于由9个基准应用程序生成的10个加速器,使用我们提出的存储器结构的电路比使用传统存储器接口的加速器平均提高了52%的性能。我们相信我们的研究代表了在混合CPU+FPGA平台上实现内存并行嵌入式计算的坚实进步。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信