2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)最新文献

筛选

英文中文

Enhancing Butterfly Fat Tree NoCs for FPGAs with Lightweight Flow Control 基于轻量级流量控制的fpga蝶形胖树noc增强

2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) Pub Date : 2019-02-20 DOI: 10.1145/3289602.3294002

G. Malik, Nachiket Kapre

{"title":"Enhancing Butterfly Fat Tree NoCs for FPGAs with Lightweight Flow Control","authors":"G. Malik, Nachiket Kapre","doi":"10.1145/3289602.3294002","DOIUrl":"https://doi.org/10.1145/3289602.3294002","url":null,"abstract":"FPGA overlay networks-on-chip (NoCs) based on Butterfly Fat Tree (BFT) topology and lightweight flow control can outperform state-of-the-art FPGA NoCs, such as Hoplite and others, on metrics such as throughput, latency, cost and power efficiency, and features such as in-order delivery and bounded packet delivery times. On one hand, lightweight FPGA NoCs built on the principle of bufferless deflection routing, such as Hoplite, can deliver low-LUT-cost implementations but sacrifice crucial features such as in-order delivery, livelock freedom, and bounds on delivery times. On the other hand, capable conventional NoCs like CONNECT provide these features but are significantly more expensive in LUT cost. Butterfly Fat Trees with lightweight flow control can deliver these features at medium cost while providing bandwidth configuration flexibility to the developer. We design FPGA-friendly routers with (1) latency-insensitive interfaces, coupled with (2) deterministic routing policy, and (3) round-robin scheduling at NoC ports to develop switches that take 311-375 LUTs/router. We evaluate our NoC under various conditions including synthetic and real-world workloads to deliver resource-proportional throughput and latency wins over competing NoCs, while significantly improving dynamic power consumption when compared to deflection-routed NoCs. We also explore the bandwidth customizability of the BFT organization to identify best NoC configurations for resource-constrained and application-requirement constrained scenarios.","PeriodicalId":116955,"journal":{"name":"2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","volume":"217 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133851828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Generic Connectivity-Based CGRA Mapping via Integer Linear Programming 基于整数线性规划的通用连通CGRA映射

2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) Pub Date : 2019-01-30 DOI: 10.1109/FCCM.2019.00019

Matthew James Peter Walker, J. Anderson

{"title":"Generic Connectivity-Based CGRA Mapping via Integer Linear Programming","authors":"Matthew James Peter Walker, J. Anderson","doi":"10.1109/FCCM.2019.00019","DOIUrl":"https://doi.org/10.1109/FCCM.2019.00019","url":null,"abstract":"Coarse-grained reconfigurable architectures (CGRAs) are programmable logic devices with large coarsegrained ALU-like logic blocks, and multi-bit datapath-style routing. CGRAs often have relatively restricted data routing networks, so they attract CAD mapping tools that use exact methods, such as Integer Linear Programming (ILP). However, tools that target general architectures must use large constraint systems to fully describe an architecture's flexibility, resulting in lengthy run-times. In this paper, we propose to derive connectivity information from an otherwise generic device model, and use this to create simpler ILPs, which we combine in an iterative schedule and retain most of the exactness of a fully-generic ILP approach. This new approach has a speed-up geometric mean of 5.88x when considering benchmarks that do not hita time-limit of 7.5 hours on the fully-generic ILP, and 37.6x otherwise. This was measured using the set of benchmarks used to originally evaluate the fully-generic approach and several more benchmarks representing computation tasks, over three different CGRA architectures. All run-times of the new approach are less than 20 minutes, with 90th percentile time of 410 seconds. The proposed mapping techniques are integrated into, and evaluated using the open-source CGRA-ME architecture modelling and exploration framework.","PeriodicalId":116955,"journal":{"name":"2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130292197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 26

AutoPhase: Compiler Phase-Ordering for HLS with Deep Reinforcement Learning 基于深度强化学习的HLS编译器相位排序

2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) Pub Date : 2019-01-15 DOI: 10.1109/FCCM.2019.00049

Qijing Huang, Ameer Haj-Ali, William S. Moses, J. Xiang, I. Stoica, K. Asanović, J. Wawrzynek

引用次数: 30

首页上一页