Exploiting Irregular Memory Parallelism in Quasi-Stencils through Nonlinear Transformation

Juan Escobedo, Mingjie Lin
{"title":"Exploiting Irregular Memory Parallelism in Quasi-Stencils through Nonlinear Transformation","authors":"Juan Escobedo, Mingjie Lin","doi":"10.1109/FCCM.2019.00039","DOIUrl":null,"url":null,"abstract":"Non-stencil kernels with irregular memory accesses pose unique challenges to achieving high computing performance and hardware efficiency in high-level synthesis (HLS) of FPGA. We present a highly versatile and systematic approach to effectively synthesizing a special and important subset of non-stencil computing kernels, quasi-stencils, which possess the mathematical property that, if studied in a particular kind of high-dimensional space corresponding to the prime factorization space, the distance between the memory accesses during each kernel iteration becomes constant and such an irregular non-stencil can be considered as a stencil. This opens the door to exploiting a vast array of existing memory optimization algorithms, such as memory partitioning/banking and data reuse, originally designed for the standard stencil-based kernel computing, therefore offering totally new opportunity to effectively synthesizing irregular non-stencil kernels. We show the feasibility of our approach implementing our methodology in a KC705 Xilinx FPGA board and tested it with several custom code segments that meet the quasi-stencil requirement vs some of the state-of the art methods in memory partitioning. We achieve significant reduction in partition factor, and perhaps more importantly making it proportional to the number of memory accesses instead of depending on the problem size with the cost of some wasted space.","PeriodicalId":116955,"journal":{"name":"2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FCCM.2019.00039","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Non-stencil kernels with irregular memory accesses pose unique challenges to achieving high computing performance and hardware efficiency in high-level synthesis (HLS) of FPGA. We present a highly versatile and systematic approach to effectively synthesizing a special and important subset of non-stencil computing kernels, quasi-stencils, which possess the mathematical property that, if studied in a particular kind of high-dimensional space corresponding to the prime factorization space, the distance between the memory accesses during each kernel iteration becomes constant and such an irregular non-stencil can be considered as a stencil. This opens the door to exploiting a vast array of existing memory optimization algorithms, such as memory partitioning/banking and data reuse, originally designed for the standard stencil-based kernel computing, therefore offering totally new opportunity to effectively synthesizing irregular non-stencil kernels. We show the feasibility of our approach implementing our methodology in a KC705 Xilinx FPGA board and tested it with several custom code segments that meet the quasi-stencil requirement vs some of the state-of the art methods in memory partitioning. We achieve significant reduction in partition factor, and perhaps more importantly making it proportional to the number of memory accesses instead of depending on the problem size with the cost of some wasted space.
利用非线性变换开发准模板的不规则存储并行性
在FPGA的高级综合(high-level synthesis, HLS)中,具有不规则存储器访问的非模板内核对实现高计算性能和硬件效率提出了独特的挑战。我们提出了一种高度通用和系统的方法来有效地综合非模板计算核的一个特殊而重要的子集——准模板。准模板具有这样的数学性质:如果在与质因数分解空间相对应的特定高维空间中进行研究,则每次内核迭代期间存储器访问之间的距离是恒定的,这样的不规则非模板可以被认为是一个模板。这为开发大量现有的内存优化算法打开了大门,例如内存分区/银行和数据重用,这些算法最初是为标准的基于模板的内核计算设计的,因此为有效地合成不规则的非模板内核提供了全新的机会。我们展示了在KC705 Xilinx FPGA板上实现我们的方法的可行性,并使用几个满足准模板要求的自定义代码段对内存分区中的一些最先进的方法进行了测试。我们实现了分区因子的显著降低,也许更重要的是使其与内存访问的数量成正比,而不是依赖于问题的大小,从而浪费了一些空间。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信