Extracting Data Parallelism in Non-Stencil Kernel Computing by Optimally Coloring Folded Memory Conflict Graph

Juan Escobedo, Mingjie Lin
{"title":"Extracting Data Parallelism in Non-Stencil Kernel Computing by Optimally Coloring Folded Memory Conflict Graph","authors":"Juan Escobedo, Mingjie Lin","doi":"10.1145/3195970.3196088","DOIUrl":null,"url":null,"abstract":"Irregular memory access pattern in non-stencil kernel computing renders the well-known hyperplane- [1], lattice- [2], or tessellation-based [3] HLS techniques ineffective. We develop an elegant yet effective technique that synthesizes memory-optimal architecture from high level software code in order to maximize application-specific data parallelism. Our basic idea is to exploit graph structures embedded in data access pattern and computation structure in order to perform the memory banking that maximizes parallel memory accesses while conserving both hardware and energy consumption. Specifically, we priority color a weighted conflict graph generated from folding the fundamental conflict graph to maximize memory conflict reduction. Most interestingly, our graph-based methodology enables a straightforward tradeoff between the number of memory banks and minimizing memory conflicts.We empirically test our methodology with Vivado HLx 2015.4 on a standard Kintex-7 device for six benchmark computing kernels by measuring conflict reduction. In particular, our approach only require 9.56% LUT, 3.2% FF, 2.5% BRAM, and 11.33% DSP of the total available hardware resource to obtain a mapping function that achieves a 90% conflict reduction on a modified forward Gaussian elimination Kernel with 4 simultaneous memory accesses.","PeriodicalId":6491,"journal":{"name":"2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)","volume":"82 1","pages":"1-6"},"PeriodicalIF":0.0000,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3195970.3196088","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

Irregular memory access pattern in non-stencil kernel computing renders the well-known hyperplane- [1], lattice- [2], or tessellation-based [3] HLS techniques ineffective. We develop an elegant yet effective technique that synthesizes memory-optimal architecture from high level software code in order to maximize application-specific data parallelism. Our basic idea is to exploit graph structures embedded in data access pattern and computation structure in order to perform the memory banking that maximizes parallel memory accesses while conserving both hardware and energy consumption. Specifically, we priority color a weighted conflict graph generated from folding the fundamental conflict graph to maximize memory conflict reduction. Most interestingly, our graph-based methodology enables a straightforward tradeoff between the number of memory banks and minimizing memory conflicts.We empirically test our methodology with Vivado HLx 2015.4 on a standard Kintex-7 device for six benchmark computing kernels by measuring conflict reduction. In particular, our approach only require 9.56% LUT, 3.2% FF, 2.5% BRAM, and 11.33% DSP of the total available hardware resource to obtain a mapping function that achieves a 90% conflict reduction on a modified forward Gaussian elimination Kernel with 4 simultaneous memory accesses.
基于折叠内存冲突图优化着色的非模板核计算数据并行性提取
在非模板核计算中,不规则的内存访问模式使得众所周知的超平面[1]、晶格[2]或基于镶嵌的[3]HLS技术失效。我们开发了一种优雅而有效的技术,从高级软件代码中综合内存优化架构,以最大限度地提高特定应用程序的数据并行性。我们的基本思想是利用嵌入在数据访问模式和计算结构中的图结构来执行内存银行,从而最大化并行内存访问,同时节省硬件和能源消耗。具体而言,我们对基本冲突图折叠后生成的加权冲突图优先着色,以最大限度地减少内存冲突。最有趣的是,我们基于图的方法可以在内存库的数量和最小化内存冲突之间进行直接的权衡。我们在标准Kintex-7设备上使用Vivado HLx 2015.4对我们的方法进行了实证测试,测试了六个基准计算内核的冲突减少情况。特别是,我们的方法只需要9.56%的LUT, 3.2%的FF, 2.5%的BRAM和11.33%的DSP就可以获得映射函数,该映射函数在修改的前向高斯消去核上实现了90%的冲突减少,同时有4个内存访问。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信