{"title":"一个高性能的覆盖架构,用于数据流图的流水线执行","authors":"D. Capalija, T. Abdelrahman","doi":"10.1109/FPL.2013.6645515","DOIUrl":null,"url":null,"abstract":"A major issue facing the widespread use of FPGAs as accelerators is their programmability wall: the difficulty of hardware design and the long synthesis times. Overlays-pre-synthesized FPGA circuits that are themselves reconfigurable - promise to tackle these challenges. We design and evaluate an overlay architecture, structured as a mesh of functional units, for pipelined execution of data-flow graphs (DFGs), a common abstraction for expressing parallelism in applications. We use data-driven execution based on elastic pipelines to balance pipeline latencies and achieve a high fMAX, scalability and maximum throughput. We prototype two overlays on a Stratix IV FPGA: a 355 MHz 24×16 integer overlay and a 312 MHz 18×16 floating-point overlay. We also design a tool that maps DFGs to overlays. We map 15 DFGs and show that the two overlays deliver throughputs of up to 35 GOPS and 22 GFLOPS, respectively. We also show that DFG mapping is fast, taking no more than 6 seconds for the largest DFG. Thus, our overlay architecture raises the level of abstraction of FPGA programming closer to that of software and avoids lengthy synthesis time, easing the use of these devices to accelerate applications.","PeriodicalId":200435,"journal":{"name":"2013 23rd International Conference on Field programmable Logic and Applications","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"79","resultStr":"{\"title\":\"A high-performance overlay architecture for pipelined execution of data flow graphs\",\"authors\":\"D. Capalija, T. Abdelrahman\",\"doi\":\"10.1109/FPL.2013.6645515\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A major issue facing the widespread use of FPGAs as accelerators is their programmability wall: the difficulty of hardware design and the long synthesis times. Overlays-pre-synthesized FPGA circuits that are themselves reconfigurable - promise to tackle these challenges. We design and evaluate an overlay architecture, structured as a mesh of functional units, for pipelined execution of data-flow graphs (DFGs), a common abstraction for expressing parallelism in applications. We use data-driven execution based on elastic pipelines to balance pipeline latencies and achieve a high fMAX, scalability and maximum throughput. We prototype two overlays on a Stratix IV FPGA: a 355 MHz 24×16 integer overlay and a 312 MHz 18×16 floating-point overlay. We also design a tool that maps DFGs to overlays. We map 15 DFGs and show that the two overlays deliver throughputs of up to 35 GOPS and 22 GFLOPS, respectively. We also show that DFG mapping is fast, taking no more than 6 seconds for the largest DFG. Thus, our overlay architecture raises the level of abstraction of FPGA programming closer to that of software and avoids lengthy synthesis time, easing the use of these devices to accelerate applications.\",\"PeriodicalId\":200435,\"journal\":{\"name\":\"2013 23rd International Conference on Field programmable Logic and Applications\",\"volume\":\"13 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-10-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"79\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 23rd International Conference on Field programmable Logic and Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/FPL.2013.6645515\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 23rd International Conference on Field programmable Logic and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FPL.2013.6645515","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 79
摘要
fpga作为加速器的广泛应用所面临的一个主要问题是其可编程性:硬件设计的困难和较长的合成时间。覆盖层——预先合成的FPGA电路本身是可重构的——有望解决这些挑战。我们设计并评估了一种覆盖架构,其结构为功能单元网格,用于数据流图(dfg)的流水线执行,这是一种表达应用程序并行性的常见抽象。我们使用基于弹性管道的数据驱动执行来平衡管道延迟,并实现高fMAX,可扩展性和最大吞吐量。我们在Stratix IV FPGA上对两个覆盖层进行了原型设计:355 MHz 24×16整数覆盖层和312 MHz 18×16浮点覆盖层。我们还设计了一个工具,将dfg映射到覆盖层。我们绘制了15个DFGs,并显示这两个覆盖分别提供高达35 GOPS和22 GFLOPS的吞吐量。我们还展示了DFG映射的速度很快,对于最大的DFG来说不超过6秒。因此,我们的覆盖架构提高了FPGA编程的抽象水平,更接近于软件的抽象水平,避免了冗长的合成时间,从而简化了使用这些设备来加速应用程序的过程。
A high-performance overlay architecture for pipelined execution of data flow graphs
A major issue facing the widespread use of FPGAs as accelerators is their programmability wall: the difficulty of hardware design and the long synthesis times. Overlays-pre-synthesized FPGA circuits that are themselves reconfigurable - promise to tackle these challenges. We design and evaluate an overlay architecture, structured as a mesh of functional units, for pipelined execution of data-flow graphs (DFGs), a common abstraction for expressing parallelism in applications. We use data-driven execution based on elastic pipelines to balance pipeline latencies and achieve a high fMAX, scalability and maximum throughput. We prototype two overlays on a Stratix IV FPGA: a 355 MHz 24×16 integer overlay and a 312 MHz 18×16 floating-point overlay. We also design a tool that maps DFGs to overlays. We map 15 DFGs and show that the two overlays deliver throughputs of up to 35 GOPS and 22 GFLOPS, respectively. We also show that DFG mapping is fast, taking no more than 6 seconds for the largest DFG. Thus, our overlay architecture raises the level of abstraction of FPGA programming closer to that of software and avoids lengthy synthesis time, easing the use of these devices to accelerate applications.