{"title":"Enhancing Butterfly Fat Tree NoCs for FPGAs with Lightweight Flow Control","authors":"G. Malik, Nachiket Kapre","doi":"10.1145/3289602.3294002","DOIUrl":"https://doi.org/10.1145/3289602.3294002","url":null,"abstract":"FPGA overlay networks-on-chip (NoCs) based on Butterfly Fat Tree (BFT) topology and lightweight flow control can outperform state-of-the-art FPGA NoCs, such as Hoplite and others, on metrics such as throughput, latency, cost and power efficiency, and features such as in-order delivery and bounded packet delivery times. On one hand, lightweight FPGA NoCs built on the principle of bufferless deflection routing, such as Hoplite, can deliver low-LUT-cost implementations but sacrifice crucial features such as in-order delivery, livelock freedom, and bounds on delivery times. On the other hand, capable conventional NoCs like CONNECT provide these features but are significantly more expensive in LUT cost. Butterfly Fat Trees with lightweight flow control can deliver these features at medium cost while providing bandwidth configuration flexibility to the developer. We design FPGA-friendly routers with (1) latency-insensitive interfaces, coupled with (2) deterministic routing policy, and (3) round-robin scheduling at NoC ports to develop switches that take 311-375 LUTs/router. We evaluate our NoC under various conditions including synthetic and real-world workloads to deliver resource-proportional throughput and latency wins over competing NoCs, while significantly improving dynamic power consumption when compared to deflection-routed NoCs. We also explore the bandwidth customizability of the BFT organization to identify best NoC configurations for resource-constrained and application-requirement constrained scenarios.","PeriodicalId":116955,"journal":{"name":"2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","volume":"217 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133851828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Generic Connectivity-Based CGRA Mapping via Integer Linear Programming","authors":"Matthew James Peter Walker, J. Anderson","doi":"10.1109/FCCM.2019.00019","DOIUrl":"https://doi.org/10.1109/FCCM.2019.00019","url":null,"abstract":"Coarse-grained reconfigurable architectures (CGRAs) are programmable logic devices with large coarsegrained ALU-like logic blocks, and multi-bit datapath-style routing. CGRAs often have relatively restricted data routing networks, so they attract CAD mapping tools that use exact methods, such as Integer Linear Programming (ILP). However, tools that target general architectures must use large constraint systems to fully describe an architecture's flexibility, resulting in lengthy run-times. In this paper, we propose to derive connectivity information from an otherwise generic device model, and use this to create simpler ILPs, which we combine in an iterative schedule and retain most of the exactness of a fully-generic ILP approach. This new approach has a speed-up geometric mean of 5.88x when considering benchmarks that do not hita time-limit of 7.5 hours on the fully-generic ILP, and 37.6x otherwise. This was measured using the set of benchmarks used to originally evaluate the fully-generic approach and several more benchmarks representing computation tasks, over three different CGRA architectures. All run-times of the new approach are less than 20 minutes, with 90th percentile time of 410 seconds. The proposed mapping techniques are integrated into, and evaluated using the open-source CGRA-ME architecture modelling and exploration framework.","PeriodicalId":116955,"journal":{"name":"2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130292197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qijing Huang, Ameer Haj-Ali, William S. Moses, J. Xiang, I. Stoica, K. Asanović, J. Wawrzynek
{"title":"AutoPhase: Compiler Phase-Ordering for HLS with Deep Reinforcement Learning","authors":"Qijing Huang, Ameer Haj-Ali, William S. Moses, J. Xiang, I. Stoica, K. Asanović, J. Wawrzynek","doi":"10.1109/FCCM.2019.00049","DOIUrl":"https://doi.org/10.1109/FCCM.2019.00049","url":null,"abstract":"The performance of the code generated by a compiler depends on the order in which the optimization passes are applied. In high-level synthesis, the quality of the generated circuit relates directly to the code generated by the front-end compiler. Choosing a good order–often referred to as the phase-ordering problem–is an NP-hard problem. In this paper, we evaluate a new technique to address the phase-ordering problem: deep reinforcement learning. We implement a framework in the context of the LLVM compiler to optimize the ordering for HLS programs and compare the performance of deep reinforcement learning to state-of-the-art algorithms that address the phase-ordering problem. Overall, our framework runs one to two orders of magnitude faster than these algorithms, and achieves a 16% improvement in circuit performance over the -O3 compiler flag.","PeriodicalId":116955,"journal":{"name":"2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134129295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}