{"title":"FastTrack: Exploiting Fast FPGA Wiring for Implementing NoC Shortcuts (Abstract Only)","authors":"Nachiket Kapre, T. Krishna","doi":"10.1145/3174243.3174962","DOIUrl":null,"url":null,"abstract":"The latency of packet-switched FPGA overlay Networks-on-Chip (NoCs) goes up linearly with the NoC dimensions, since packets typically spend a cycle in each dynamic router along the path. High-performance FPGA NoCs have to aggressively pipeline interconnects, thereby adding extra latency overhead to the NoC. The use of FPGA-friendly deflection routing schemes further exacerbates latency. Fortunately, FPGAs provide segmented interconnects with different lengths (speeds). Faster FPGA tracks can be used to reduce the number of switchbox hops along the packet path. We introduce FastTrack, an adaption to the NoC organization that inserts express bypass links in the NoC to skip multiple router stages in a single clock cycle. Our FastTrack design can be tuned to support different express link lengths for performance, and depopulation strategies for controlling cost. For the Xilinx Virtex-7 485T FPGA, an 8×8 FastTrack NoC is 2× larger than a base Hoplite NoC, but operates between 1.2-0.8× its clock frequency when using express links of length 2-4. FastTrack delivers throughput and latency improvements across a range of statistical workloads (2-2.5×), and traces extracted from FPGA accelerator case studies such as Sparse Matrix-Vector Multiplication (2.5×), Graph Analytics (2.8×), and Multi-processor overlay applications (2×). FastTrack also shows energy efficiency improvements by factors of up to 2× over baseline Hoplite due to higher sustained rates and high speed operation of express links made possible by fast FPGA interconnect.","PeriodicalId":164936,"journal":{"name":"Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3174243.3174962","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The latency of packet-switched FPGA overlay Networks-on-Chip (NoCs) goes up linearly with the NoC dimensions, since packets typically spend a cycle in each dynamic router along the path. High-performance FPGA NoCs have to aggressively pipeline interconnects, thereby adding extra latency overhead to the NoC. The use of FPGA-friendly deflection routing schemes further exacerbates latency. Fortunately, FPGAs provide segmented interconnects with different lengths (speeds). Faster FPGA tracks can be used to reduce the number of switchbox hops along the packet path. We introduce FastTrack, an adaption to the NoC organization that inserts express bypass links in the NoC to skip multiple router stages in a single clock cycle. Our FastTrack design can be tuned to support different express link lengths for performance, and depopulation strategies for controlling cost. For the Xilinx Virtex-7 485T FPGA, an 8×8 FastTrack NoC is 2× larger than a base Hoplite NoC, but operates between 1.2-0.8× its clock frequency when using express links of length 2-4. FastTrack delivers throughput and latency improvements across a range of statistical workloads (2-2.5×), and traces extracted from FPGA accelerator case studies such as Sparse Matrix-Vector Multiplication (2.5×), Graph Analytics (2.8×), and Multi-processor overlay applications (2×). FastTrack also shows energy efficiency improvements by factors of up to 2× over baseline Hoplite due to higher sustained rates and high speed operation of express links made possible by fast FPGA interconnect.