Exploiting Irregular Memory Parallelism in Quasi-Stencils through Nonlinear Transformation

2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) Pub Date : 2019-04-01 DOI:10.1109/FCCM.2019.00039

Juan Escobedo, Mingjie Lin

{"title":"Exploiting Irregular Memory Parallelism in Quasi-Stencils through Nonlinear Transformation","authors":"Juan Escobedo, Mingjie Lin","doi":"10.1109/FCCM.2019.00039","DOIUrl":null,"url":null,"abstract":"Non-stencil kernels with irregular memory accesses pose unique challenges to achieving high computing performance and hardware efficiency in high-level synthesis (HLS) of FPGA. We present a highly versatile and systematic approach to effectively synthesizing a special and important subset of non-stencil computing kernels, quasi-stencils, which possess the mathematical property that, if studied in a particular kind of high-dimensional space corresponding to the prime factorization space, the distance between the memory accesses during each kernel iteration becomes constant and such an irregular non-stencil can be considered as a stencil. This opens the door to exploiting a vast array of existing memory optimization algorithms, such as memory partitioning/banking and data reuse, originally designed for the standard stencil-based kernel computing, therefore offering totally new opportunity to effectively synthesizing irregular non-stencil kernels. We show the feasibility of our approach implementing our methodology in a KC705 Xilinx FPGA board and tested it with several custom code segments that meet the quasi-stencil requirement vs some of the state-of the art methods in memory partitioning. We achieve significant reduction in partition factor, and perhaps more importantly making it proportional to the number of memory accesses instead of depending on the problem size with the cost of some wasted space.","PeriodicalId":116955,"journal":{"name":"2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FCCM.2019.00039","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Non-stencil kernels with irregular memory accesses pose unique challenges to achieving high computing performance and hardware efficiency in high-level synthesis (HLS) of FPGA. We present a highly versatile and systematic approach to effectively synthesizing a special and important subset of non-stencil computing kernels, quasi-stencils, which possess the mathematical property that, if studied in a particular kind of high-dimensional space corresponding to the prime factorization space, the distance between the memory accesses during each kernel iteration becomes constant and such an irregular non-stencil can be considered as a stencil. This opens the door to exploiting a vast array of existing memory optimization algorithms, such as memory partitioning/banking and data reuse, originally designed for the standard stencil-based kernel computing, therefore offering totally new opportunity to effectively synthesizing irregular non-stencil kernels. We show the feasibility of our approach implementing our methodology in a KC705 Xilinx FPGA board and tested it with several custom code segments that meet the quasi-stencil requirement vs some of the state-of the art methods in memory partitioning. We achieve significant reduction in partition factor, and perhaps more importantly making it proportional to the number of memory accesses instead of depending on the problem size with the cost of some wasted space.

查看原文本刊更多论文

利用非线性变换开发准模板的不规则存储并行性

在FPGA的高级综合(high-level synthesis, HLS)中，具有不规则存储器访问的非模板内核对实现高计算性能和硬件效率提出了独特的挑战。我们提出了一种高度通用和系统的方法来有效地综合非模板计算核的一个特殊而重要的子集——准模板。准模板具有这样的数学性质:如果在与质因数分解空间相对应的特定高维空间中进行研究，则每次内核迭代期间存储器访问之间的距离是恒定的，这样的不规则非模板可以被认为是一个模板。这为开发大量现有的内存优化算法打开了大门，例如内存分区/银行和数据重用，这些算法最初是为标准的基于模板的内核计算设计的，因此为有效地合成不规则的非模板内核提供了全新的机会。我们展示了在KC705 Xilinx FPGA板上实现我们的方法的可行性，并使用几个满足准模板要求的自定义代码段对内存分区中的一些最先进的方法进行了测试。我们实现了分区因子的显著降低，也许更重要的是使其与内存访问的数量成正比，而不是依赖于问题的大小，从而浪费了一些空间。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)

自引率

0.00%

发文量