{"title":"Transformations for throughput optimization in high-level synthesis (abstract only)","authors":"Peng Li, L. Pouchet, Deming Chen, J. Cong","doi":"10.1145/2554688.2554772","DOIUrl":null,"url":null,"abstract":"Programming productivity of FPGA devices remains a significant challenge, despite the emergence of robust high level synthesis tools to automatically transform codes written in high-level languages into RTL implementations. Focusing on a class of programs with regular loop bounds and array accesses (so-called affine programs), the polyhedral compilation framework provides a convenient environment to automate many of the manual program transformation tasks that are still needed to improve the QoR of the HLS tool. In this work, we demonstrate that tiling-driven affine loop transformations, while mandatory to ensure good data reuse and reduce off-chip communication volumes, are not always enough to achieve the best throughput, determined by the Initiation Interval (II) for loop pipelining. We develop additional techniques to optimize the computation part to be executed on the FPGA, using Index-Set Splitting (ISS) to split loops into sub-loops with different properties (sequential/parallel, different memory port conflicts features). This is motivated by the presence of non-uniform data dependences in some affine benchmarks, which are not effectively handled by the affine transformation system for tiling implemented in the PolyOpt/HLS software. We develop a customized affine+ISS optimization algorithm that aims at reducing the II of pipelined inner loops to reduce the program latency. We report experimental results on numerous affine computations.","PeriodicalId":390562,"journal":{"name":"Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2554688.2554772","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Programming productivity of FPGA devices remains a significant challenge, despite the emergence of robust high level synthesis tools to automatically transform codes written in high-level languages into RTL implementations. Focusing on a class of programs with regular loop bounds and array accesses (so-called affine programs), the polyhedral compilation framework provides a convenient environment to automate many of the manual program transformation tasks that are still needed to improve the QoR of the HLS tool. In this work, we demonstrate that tiling-driven affine loop transformations, while mandatory to ensure good data reuse and reduce off-chip communication volumes, are not always enough to achieve the best throughput, determined by the Initiation Interval (II) for loop pipelining. We develop additional techniques to optimize the computation part to be executed on the FPGA, using Index-Set Splitting (ISS) to split loops into sub-loops with different properties (sequential/parallel, different memory port conflicts features). This is motivated by the presence of non-uniform data dependences in some affine benchmarks, which are not effectively handled by the affine transformation system for tiling implemented in the PolyOpt/HLS software. We develop a customized affine+ISS optimization algorithm that aims at reducing the II of pipelined inner loops to reduce the program latency. We report experimental results on numerous affine computations.