共享内存平台上FPGA加速图形处理的处理器辅助工作列表调度

2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) Pub Date : 2019-04-01 DOI:10.1109/FCCM.2019.00028

Yu Wang, J. Hoe, E. Nurvitadhi

{"title":"共享内存平台上FPGA加速图形处理的处理器辅助工作列表调度","authors":"Yu Wang, J. Hoe, E. Nurvitadhi","doi":"10.1109/FCCM.2019.00028","DOIUrl":null,"url":null,"abstract":"FPGA-based processing has gained much attention for accelerating graph analytics because of the demand in performance and energy efficiency. However, while priority scheduling has been shown to be an effective optimization for improving performance for worklist-based graph computations, it is rarely used in accelerator designs due to its implementation complexity and memory-access overhead. In this paper, we present a heterogeneous processing approach for priority scheduling on a shared-memory CPU-FPGA platform. By exploiting the closely-coupled integration of the host processor and the FPGA accelerator, our system dynamically offloads the task of scheduling to a software scheduler on the processor for its programmability, high-capacity cache and low memory latency, while the FPGA graph processing accelerator enjoys the scheduling benefit and delivers higher performance at excellent energy efficiency. To understand the effectiveness of our solution, we compared it with FPGA-only solutions for two scheduling schemes: the well-known Dijkstra scheduling for Single Source Shortest Path and a new scheduling optimization we developed for improving the data locality of Breadth First Search. Whereas the FPGA-only solution requires an impractical amount of on-chip storage to implement a priority queue, the proposed processor-assisted scheduling that moves the task of scheduling to the processor consumes a negligible load on the processor and retains most of the performance benefit from priority scheduling.","PeriodicalId":116955,"journal":{"name":"2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","volume":"84 7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":"{\"title\":\"Processor Assisted Worklist Scheduling for FPGA Accelerated Graph Processing on a Shared-Memory Platform\",\"authors\":\"Yu Wang, J. Hoe, E. Nurvitadhi\",\"doi\":\"10.1109/FCCM.2019.00028\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"FPGA-based processing has gained much attention for accelerating graph analytics because of the demand in performance and energy efficiency. However, while priority scheduling has been shown to be an effective optimization for improving performance for worklist-based graph computations, it is rarely used in accelerator designs due to its implementation complexity and memory-access overhead. In this paper, we present a heterogeneous processing approach for priority scheduling on a shared-memory CPU-FPGA platform. By exploiting the closely-coupled integration of the host processor and the FPGA accelerator, our system dynamically offloads the task of scheduling to a software scheduler on the processor for its programmability, high-capacity cache and low memory latency, while the FPGA graph processing accelerator enjoys the scheduling benefit and delivers higher performance at excellent energy efficiency. To understand the effectiveness of our solution, we compared it with FPGA-only solutions for two scheduling schemes: the well-known Dijkstra scheduling for Single Source Shortest Path and a new scheduling optimization we developed for improving the data locality of Breadth First Search. Whereas the FPGA-only solution requires an impractical amount of on-chip storage to implement a priority queue, the proposed processor-assisted scheduling that moves the task of scheduling to the processor consumes a negligible load on the processor and retains most of the performance benefit from priority scheduling.\",\"PeriodicalId\":116955,\"journal\":{\"name\":\"2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)\",\"volume\":\"84 7 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"17\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/FCCM.2019.00028\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FCCM.2019.00028","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 17

摘要

由于对性能和能源效率的要求，基于fpga的处理在加速图形分析方面受到了广泛的关注。然而，尽管优先级调度已被证明是提高基于工作列表的图计算性能的有效优化，但由于其实现复杂性和内存访问开销，它很少用于加速器设计。在本文中，我们提出了一种在共享内存CPU-FPGA平台上进行优先级调度的异构处理方法。通过利用主处理器和FPGA加速器的紧密耦合集成，我们的系统动态地将调度任务卸载到处理器上的软件调度程序上，因为它具有可编程性、高容量缓存和低内存延迟，而FPGA图形处理加速器则享有调度优势，并在卓越的能效下提供更高的性能。为了了解我们的解决方案的有效性，我们将其与两种调度方案的纯fpga解决方案进行了比较:众所周知的单源最短路径的Dijkstra调度和我们为提高广度优先搜索的数据局域性而开发的新调度优化。然而，纯fpga解决方案需要不切实际的片上存储量来实现优先级队列，而提议的处理器辅助调度将调度任务转移到处理器上，消耗处理器上可以忽略不计的负载，并保留了优先级调度的大部分性能优势。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Processor Assisted Worklist Scheduling for FPGA Accelerated Graph Processing on a Shared-Memory Platform

FPGA-based processing has gained much attention for accelerating graph analytics because of the demand in performance and energy efficiency. However, while priority scheduling has been shown to be an effective optimization for improving performance for worklist-based graph computations, it is rarely used in accelerator designs due to its implementation complexity and memory-access overhead. In this paper, we present a heterogeneous processing approach for priority scheduling on a shared-memory CPU-FPGA platform. By exploiting the closely-coupled integration of the host processor and the FPGA accelerator, our system dynamically offloads the task of scheduling to a software scheduler on the processor for its programmability, high-capacity cache and low memory latency, while the FPGA graph processing accelerator enjoys the scheduling benefit and delivers higher performance at excellent energy efficiency. To understand the effectiveness of our solution, we compared it with FPGA-only solutions for two scheduling schemes: the well-known Dijkstra scheduling for Single Source Shortest Path and a new scheduling optimization we developed for improving the data locality of Breadth First Search. Whereas the FPGA-only solution requires an impractical amount of on-chip storage to implement a priority queue, the proposed processor-assisted scheduling that moves the task of scheduling to the processor consumes a negligible load on the processor and retains most of the performance benefit from priority scheduling.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)

自引率

0.00%

发文量