1K manycore FPGA shared memory architecture for SOC (abstract only)

Y. Ben-Asher, Jacob Gendel, Gadi Haber, Oren Segal, Yousef Shajrawi
{"title":"1K manycore FPGA shared memory architecture for SOC (abstract only)","authors":"Y. Ben-Asher, Jacob Gendel, Gadi Haber, Oren Segal, Yousef Shajrawi","doi":"10.1145/2554688.2554699","DOIUrl":null,"url":null,"abstract":"Manycore shared memory architectures hold a significant premise to speed up and simplify SOCs. Using many homogeneous small-cores will allow replacing the hardware accelerators of SOCs by parallel algorithms communicating through shared memory. Currently shared memory is realized by maintaining cache-consistency across the cores, caching all the connected cores to one main memory module. This approach, though used today, is not likely to be scalable enough to support the high number of cores needed for highly parallel SOCs. Therefore we consider a theoretical scheme for shared memory wherein: the shared address space is divided between a set of memory modules; and a communication network allows each core to access every such module in parallel. Load-balancing between the memory modules is obtained by rehashing the memory address-space. We have designed a simple generic shared memory architecture, synthesized it to 2,4,8,,..1024-cores for FPGA virtex-7 and evaluated it on several parallel programs. The synthesis results and the execution measurements show that, for the FPGA, all problematic aspects of this construction can be resolved. For example, unlike ASICs, the growing complexity of the communication network is absorbed by the FPGA's routing grid and by its routing mechanism. This makes this type of architectures particularly suitable for FPGAs. We used 32-bits modified PACOBLAZE cores and tested different parameters of this architecture verifying its ability to achieve high speedups. The results suggest that re-hashing is not essential and one hash-function suffice (compared to the family of universal hash functions that is needed by the theoretical construction).","PeriodicalId":390562,"journal":{"name":"Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays","volume":"80 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2554688.2554699","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Manycore shared memory architectures hold a significant premise to speed up and simplify SOCs. Using many homogeneous small-cores will allow replacing the hardware accelerators of SOCs by parallel algorithms communicating through shared memory. Currently shared memory is realized by maintaining cache-consistency across the cores, caching all the connected cores to one main memory module. This approach, though used today, is not likely to be scalable enough to support the high number of cores needed for highly parallel SOCs. Therefore we consider a theoretical scheme for shared memory wherein: the shared address space is divided between a set of memory modules; and a communication network allows each core to access every such module in parallel. Load-balancing between the memory modules is obtained by rehashing the memory address-space. We have designed a simple generic shared memory architecture, synthesized it to 2,4,8,,..1024-cores for FPGA virtex-7 and evaluated it on several parallel programs. The synthesis results and the execution measurements show that, for the FPGA, all problematic aspects of this construction can be resolved. For example, unlike ASICs, the growing complexity of the communication network is absorbed by the FPGA's routing grid and by its routing mechanism. This makes this type of architectures particularly suitable for FPGAs. We used 32-bits modified PACOBLAZE cores and tested different parameters of this architecture verifying its ability to achieve high speedups. The results suggest that re-hashing is not essential and one hash-function suffice (compared to the family of universal hash functions that is needed by the theoretical construction).
用于SOC的1K多核FPGA共享内存架构(仅抽象)
多核共享内存架构是加速和简化soc的重要前提。使用许多同质小核将允许通过共享内存通信的并行算法取代soc的硬件加速器。目前,共享内存是通过维护核心之间的缓存一致性来实现的,将所有连接的核心缓存到一个主内存模块。这种方法虽然在今天使用,但不太可能具有足够的可扩展性来支持高度并行soc所需的大量内核。因此,我们考虑了一种共享内存的理论方案,其中:共享地址空间在一组内存模块之间划分;通信网络允许每个核心并行访问每个这样的模块。通过重新散列内存地址空间来获得内存模块之间的负载平衡。我们设计了一个简单的通用共享内存体系结构,并将其合成为2,4,8,…FPGA virtex-7的1024核,并在几个并行程序上进行了评估。综合结果和执行测量表明,对于FPGA来说,这种结构的所有问题都可以得到解决。例如,与asic不同,FPGA的路由网格及其路由机制吸收了通信网络日益增长的复杂性。这使得这种类型的架构特别适合fpga。我们使用32位修改的PACOBLAZE内核,并测试了该架构的不同参数,以验证其实现高速的能力。结果表明,重新哈希不是必需的,一个哈希函数就足够了(与理论构造所需的通用哈希函数族相比)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信