Improving high level synthesis optimization opportunity through polyhedral transformations

Wei Zuo, Yun Liang, Peng Li, K. Rupnow, Deming Chen, J. Cong
{"title":"Improving high level synthesis optimization opportunity through polyhedral transformations","authors":"Wei Zuo, Yun Liang, Peng Li, K. Rupnow, Deming Chen, J. Cong","doi":"10.1145/2435264.2435271","DOIUrl":null,"url":null,"abstract":"High level synthesis (HLS) is an important enabling technology for the adoption of hardware accelerator technologies. It promises the performance and energy efficiency of hardware designs with a lower barrier to entry in design expertise, and shorter design time. State-of-the-art high level synthesis now includes a wide variety of powerful optimizations that implement efficient hardware. These optimizations can implement some of the most important features generally performed in manual designs including parallel hardware units, pipelining of execution both within a hardware unit and between units, and fine-grained data communication. We may generally classify the optimizations as those that optimize hardware implementation within a code block (intra-block) and those that optimize communication and pipelining between code blocks (inter-block). However, both optimizations are in practice difficult to apply. Real-world applications contain data-dependent blocks of code and communicate through complex data access patterns. Existing high level synthesis tools cannot apply these powerful optimizations unless the code is inherently compatible, severely limiting the optimization opportunity. In this paper we present an integrated framework to model and enable both intra- and inter-block optimizations. This integrated technique substantially improves the opportunity to use the powerful HLS optimizations that implement parallelism, pipelining, and fine-grained communication. Our polyhedral model-based technique systematically defines a set of data access patterns, identifies effective data access patterns, and performs the loop transformations to enable the intra- and inter-block optimizations. Our framework automatically explores transformation options, performs code transformations, and inserts the appropriate HLS directives to implement the HLS optimizations. Furthermore, our framework can automatically generate the optimized communication blocks for fine-grained communication between hardware blocks. Experimental evaluation demonstrates that we can achieve an average of 6.04X speedup over the high level synthesis solution without our transformations to enable intra- and inter-block optimizations.","PeriodicalId":87257,"journal":{"name":"FPGA. ACM International Symposium on Field-Programmable Gate Arrays","volume":"27 1","pages":"9-18"},"PeriodicalIF":0.0000,"publicationDate":"2013-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"85","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"FPGA. ACM International Symposium on Field-Programmable Gate Arrays","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2435264.2435271","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 85

Abstract

High level synthesis (HLS) is an important enabling technology for the adoption of hardware accelerator technologies. It promises the performance and energy efficiency of hardware designs with a lower barrier to entry in design expertise, and shorter design time. State-of-the-art high level synthesis now includes a wide variety of powerful optimizations that implement efficient hardware. These optimizations can implement some of the most important features generally performed in manual designs including parallel hardware units, pipelining of execution both within a hardware unit and between units, and fine-grained data communication. We may generally classify the optimizations as those that optimize hardware implementation within a code block (intra-block) and those that optimize communication and pipelining between code blocks (inter-block). However, both optimizations are in practice difficult to apply. Real-world applications contain data-dependent blocks of code and communicate through complex data access patterns. Existing high level synthesis tools cannot apply these powerful optimizations unless the code is inherently compatible, severely limiting the optimization opportunity. In this paper we present an integrated framework to model and enable both intra- and inter-block optimizations. This integrated technique substantially improves the opportunity to use the powerful HLS optimizations that implement parallelism, pipelining, and fine-grained communication. Our polyhedral model-based technique systematically defines a set of data access patterns, identifies effective data access patterns, and performs the loop transformations to enable the intra- and inter-block optimizations. Our framework automatically explores transformation options, performs code transformations, and inserts the appropriate HLS directives to implement the HLS optimizations. Furthermore, our framework can automatically generate the optimized communication blocks for fine-grained communication between hardware blocks. Experimental evaluation demonstrates that we can achieve an average of 6.04X speedup over the high level synthesis solution without our transformations to enable intra- and inter-block optimizations.
通过多面体变换提高高级综合优化机会
高阶合成(High - level synthesis, HLS)是采用硬件加速器技术的重要使能技术。它保证了硬件设计的性能和能效,降低了进入设计专业知识的门槛,缩短了设计时间。最先进的高级综合现在包括实现高效硬件的各种强大优化。这些优化可以实现通常在手动设计中执行的一些最重要的功能,包括并行硬件单元、硬件单元内部和单元之间的执行流水线,以及细粒度的数据通信。我们通常可以将优化分为在代码块内(块内)优化硬件实现的优化和在代码块之间(块间)优化通信和流水线的优化。然而,这两种优化在实践中都很难应用。现实世界的应用程序包含依赖于数据的代码块,并通过复杂的数据访问模式进行通信。现有的高级综合工具不能应用这些强大的优化,除非代码本身是兼容的,这严重限制了优化的机会。在本文中,我们提出了一个集成的框架来建模和实现块内和块间的优化。这种集成技术大大提高了使用强大的HLS优化的机会,这些优化实现了并行性、流水线和细粒度通信。我们基于多面体模型的技术系统地定义了一组数据访问模式,识别有效的数据访问模式,并执行循环转换以实现块内和块间优化。我们的框架会自动探索转换选项,执行代码转换,并插入适当的HLS指令来实现HLS优化。此外,我们的框架可以自动生成优化的通信块,用于硬件块之间的细粒度通信。实验评估表明,在没有转换以实现块内和块间优化的情况下,我们可以在高级合成解决方案上实现平均6.04倍的加速。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信