{"title":"fpga上高能效数据流计算的操作调度与架构协同合成(仅摘要)","authors":"C. Y. Lin, N. Wong, Hayden Kwok-Hay So","doi":"10.1145/2145694.2145757","DOIUrl":null,"url":null,"abstract":"Compiling high-level user applications for execution on FPGAs often involves synthesizing dataflow graphs beyond the size of the available on-chip computational resources. One way to address this is by folding the execution of the given dataflow graphs onto an array of directly connected simple configurable processing elements (CPEs). Under this scenario, the performance and energy-efficiency of the resulting system depends not only on the mapping schedule of the compute operations on the CPEs, but also on the topology of the interconnect array that connects the CPEs. This paper presents a framework in which the operation scheduler and the underlying CPE interconnect network topology are co-optimized on a per-application basis for energy-efficient FPGA computation. Given the same application, more than 2.5x difference in energy-efficiency was achievable by the use of different common regular array topologies to connect the CPEs. Moreover, by using irregular application-specific interconnect topologies derived from a genetic algorithm, up to 50% improvement in energy-delay-product was achievable when compared to the use of even the best regular topology. The use of such framework is anticipated to serve as part of a rapid high-level FPGA application compiler since minimum hardware place-and-route is needed to generate the optimal schedule and topology.","PeriodicalId":87257,"journal":{"name":"FPGA. ACM International Symposium on Field-Programmable Gate Arrays","volume":"38 8","pages":"270"},"PeriodicalIF":0.0000,"publicationDate":"2012-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Operation scheduling and architecture co-synthesis for energy-efficient dataflow computations on FPGAs (abstract only)\",\"authors\":\"C. Y. Lin, N. Wong, Hayden Kwok-Hay So\",\"doi\":\"10.1145/2145694.2145757\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Compiling high-level user applications for execution on FPGAs often involves synthesizing dataflow graphs beyond the size of the available on-chip computational resources. One way to address this is by folding the execution of the given dataflow graphs onto an array of directly connected simple configurable processing elements (CPEs). Under this scenario, the performance and energy-efficiency of the resulting system depends not only on the mapping schedule of the compute operations on the CPEs, but also on the topology of the interconnect array that connects the CPEs. This paper presents a framework in which the operation scheduler and the underlying CPE interconnect network topology are co-optimized on a per-application basis for energy-efficient FPGA computation. Given the same application, more than 2.5x difference in energy-efficiency was achievable by the use of different common regular array topologies to connect the CPEs. Moreover, by using irregular application-specific interconnect topologies derived from a genetic algorithm, up to 50% improvement in energy-delay-product was achievable when compared to the use of even the best regular topology. The use of such framework is anticipated to serve as part of a rapid high-level FPGA application compiler since minimum hardware place-and-route is needed to generate the optimal schedule and topology.\",\"PeriodicalId\":87257,\"journal\":{\"name\":\"FPGA. ACM International Symposium on Field-Programmable Gate Arrays\",\"volume\":\"38 8\",\"pages\":\"270\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-02-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"FPGA. ACM International Symposium on Field-Programmable Gate Arrays\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2145694.2145757\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"FPGA. ACM International Symposium on Field-Programmable Gate Arrays","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2145694.2145757","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Operation scheduling and architecture co-synthesis for energy-efficient dataflow computations on FPGAs (abstract only)
Compiling high-level user applications for execution on FPGAs often involves synthesizing dataflow graphs beyond the size of the available on-chip computational resources. One way to address this is by folding the execution of the given dataflow graphs onto an array of directly connected simple configurable processing elements (CPEs). Under this scenario, the performance and energy-efficiency of the resulting system depends not only on the mapping schedule of the compute operations on the CPEs, but also on the topology of the interconnect array that connects the CPEs. This paper presents a framework in which the operation scheduler and the underlying CPE interconnect network topology are co-optimized on a per-application basis for energy-efficient FPGA computation. Given the same application, more than 2.5x difference in energy-efficiency was achievable by the use of different common regular array topologies to connect the CPEs. Moreover, by using irregular application-specific interconnect topologies derived from a genetic algorithm, up to 50% improvement in energy-delay-product was achievable when compared to the use of even the best regular topology. The use of such framework is anticipated to serve as part of a rapid high-level FPGA application compiler since minimum hardware place-and-route is needed to generate the optimal schedule and topology.