{"title":"Towards automatic customization of interconnect and memory in the CoRAM abstraction (abstract only)","authors":"Eric S. Chung, Michael Papamichael","doi":"10.1145/2435264.2435311","DOIUrl":null,"url":null,"abstract":"When developing applications to run on FPGAs, we tend to expend great effort on crafting the custom hardware acceleration datapath---but blindly turn to the FPGA vendor tool/library to provide default solutions for on-chip interconnect and external interfaces. This often leads to ineffective communication- or memory-bound implementations since the design and tuning of the default general-purpose solutions necessarily makes design compromises for generality. This is counterproductive as the FPGA's flexible reconfigurability should afford us great opportunities for performance gain and cost reduction through extensive application-specific customization of the interconnect and interface IPs. This work presents a compiler that generates custom interconnect topology and connectivity with appropriately scaled capacity to support an application's exact communication requirements at a minimized cost. More specifically, the compiler analyzes an application developed for the CoRAM abstraction [1,2] for its connectivity and bandwidth requirements between the hardware processing kernels and external DRAM banks. The result is an extremely fine-tuned custom-topology soft-logic network-on-chip interconnect, which is enabled by the CONNECT NoC framework [3].\n We perform an extensive evaluation that benchmarks two applications against the standard CoRAM implementation flow that relies on a fixed generically-tuned general-purpose soft-logic network-on-chip. Our RTL-driven evaluation shows a large opportunity for area reduction and improved efficiency (up by 48%) without any impact on application performance.","PeriodicalId":87257,"journal":{"name":"FPGA. ACM International Symposium on Field-Programmable Gate Arrays","volume":"23 1","pages":"265"},"PeriodicalIF":0.0000,"publicationDate":"2013-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"FPGA. ACM International Symposium on Field-Programmable Gate Arrays","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2435264.2435311","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
When developing applications to run on FPGAs, we tend to expend great effort on crafting the custom hardware acceleration datapath---but blindly turn to the FPGA vendor tool/library to provide default solutions for on-chip interconnect and external interfaces. This often leads to ineffective communication- or memory-bound implementations since the design and tuning of the default general-purpose solutions necessarily makes design compromises for generality. This is counterproductive as the FPGA's flexible reconfigurability should afford us great opportunities for performance gain and cost reduction through extensive application-specific customization of the interconnect and interface IPs. This work presents a compiler that generates custom interconnect topology and connectivity with appropriately scaled capacity to support an application's exact communication requirements at a minimized cost. More specifically, the compiler analyzes an application developed for the CoRAM abstraction [1,2] for its connectivity and bandwidth requirements between the hardware processing kernels and external DRAM banks. The result is an extremely fine-tuned custom-topology soft-logic network-on-chip interconnect, which is enabled by the CONNECT NoC framework [3].
We perform an extensive evaluation that benchmarks two applications against the standard CoRAM implementation flow that relies on a fixed generically-tuned general-purpose soft-logic network-on-chip. Our RTL-driven evaluation shows a large opportunity for area reduction and improved efficiency (up by 48%) without any impact on application performance.