在可重构加速器中利用内存级并行性

2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines Pub Date : 2012-04-29 DOI:10.1109/FCCM.2012.35

Shaoyi Cheng, Mingjie Lin, H. Liu, S. Scott, J. Wawrzynek

{"title":"在可重构加速器中利用内存级并行性","authors":"Shaoyi Cheng, Mingjie Lin, H. Liu, S. Scott, J. Wawrzynek","doi":"10.1109/FCCM.2012.35","DOIUrl":null,"url":null,"abstract":"As memory accesses increasingly limit the overall performance of reconfigurable accelerators, it is important for high level synthesis (HLS) flows to discover and exploit memory-level parallelism. This paper develops 1) a framework where parallelism between memory accesses can be revealed from runtime profile of applications and provided to a high level synthesis flow, and 2) a novel multi-accelerator/multi-cache architecture to support parallel memory accesses, taking advantage of the high aggregated memory bandwidth found in modern FPGA devices. Our experimental results have shown that for 10 accelerators generated from 9 benchmark applications, circuits using our proposed memory structure achieve on average 52% improved performance over accelerators using a traditional memory interface. We believe that our study represents a solid advance towards achieving memory-parallel embedded computing on hybrid CPU+FPGA platforms.","PeriodicalId":226197,"journal":{"name":"2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines","volume":"40 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":"{\"title\":\"Exploiting Memory-Level Parallelism in Reconfigurable Accelerators\",\"authors\":\"Shaoyi Cheng, Mingjie Lin, H. Liu, S. Scott, J. Wawrzynek\",\"doi\":\"10.1109/FCCM.2012.35\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"As memory accesses increasingly limit the overall performance of reconfigurable accelerators, it is important for high level synthesis (HLS) flows to discover and exploit memory-level parallelism. This paper develops 1) a framework where parallelism between memory accesses can be revealed from runtime profile of applications and provided to a high level synthesis flow, and 2) a novel multi-accelerator/multi-cache architecture to support parallel memory accesses, taking advantage of the high aggregated memory bandwidth found in modern FPGA devices. Our experimental results have shown that for 10 accelerators generated from 9 benchmark applications, circuits using our proposed memory structure achieve on average 52% improved performance over accelerators using a traditional memory interface. We believe that our study represents a solid advance towards achieving memory-parallel embedded computing on hybrid CPU+FPGA platforms.\",\"PeriodicalId\":226197,\"journal\":{\"name\":\"2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines\",\"volume\":\"40 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-04-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"13\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/FCCM.2012.35\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FCCM.2012.35","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 13

摘要

由于内存访问越来越多地限制了可重构加速器的整体性能，因此发现和利用内存级并行性对于高级综合流(HLS)非常重要。本文开发了1)一个框架，其中内存访问之间的并行性可以从应用程序的运行时配置文件中显示出来，并提供给高层次的合成流;2)一个新的多加速器/多缓存架构来支持并行内存访问，利用现代FPGA设备中发现的高聚合内存带宽。我们的实验结果表明，对于由9个基准应用程序生成的10个加速器，使用我们提出的存储器结构的电路比使用传统存储器接口的加速器平均提高了52%的性能。我们相信我们的研究代表了在混合CPU+FPGA平台上实现内存并行嵌入式计算的坚实进步。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Exploiting Memory-Level Parallelism in Reconfigurable Accelerators

As memory accesses increasingly limit the overall performance of reconfigurable accelerators, it is important for high level synthesis (HLS) flows to discover and exploit memory-level parallelism. This paper develops 1) a framework where parallelism between memory accesses can be revealed from runtime profile of applications and provided to a high level synthesis flow, and 2) a novel multi-accelerator/multi-cache architecture to support parallel memory accesses, taking advantage of the high aggregated memory bandwidth found in modern FPGA devices. Our experimental results have shown that for 10 accelerators generated from 9 benchmark applications, circuits using our proposed memory structure achieve on average 52% improved performance over accelerators using a traditional memory interface. We believe that our study represents a solid advance towards achieving memory-parallel embedded computing on hybrid CPU+FPGA platforms.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines

自引率

0.00%

发文量