CPU-FPGA紧密耦合平台的短传输模型

2018 International Conference on Field-Programmable Technology (FPT) Pub Date : 2018-12-01 DOI:10.1109/FPT.2018.00075

Alexander Kroh, O. Diessel

{"title":"CPU-FPGA紧密耦合平台的短传输模型","authors":"Alexander Kroh, O. Diessel","doi":"10.1109/FPT.2018.00075","DOIUrl":null,"url":null,"abstract":"Due to the cost of repeated data movement between CPU and FPGA, the use of FPGA-based accelerators has traditionally been limited to offloading long-running tasks from the CPU to programmable logic. Although modern heterogeneous platforms, such as Zynq and HARP, reduce the costs of CPU-FPGA data transfers, the traditional offload model is cemented as the popular choice. For these systems to become truly heterogeneous, the utilisation of all computational resources should be optimised. In particular, the CPU and FPGA should cooperate by dividing the workload between them so as to maximize system throughput. We first derive a model that predicts the optimum partitioning of a workload between hardware and software. We then measure the performance of short transfers between CPU and FPGA on the Zynq CPU-FPGA platform. Such transfers are essential to efficiently synchronise between cooperating hardware and software tasks. Finally, we demonstrate how our derived model can be used to choose the optimum workload partitioning to within 8% of the optimum for an accumulator task and predict its execution time within 12%.","PeriodicalId":434541,"journal":{"name":"2018 International Conference on Field-Programmable Technology (FPT)","volume":"81 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"A Short-Transfer Model for Tightly-Coupled CPU-FPGA Platforms\",\"authors\":\"Alexander Kroh, O. Diessel\",\"doi\":\"10.1109/FPT.2018.00075\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Due to the cost of repeated data movement between CPU and FPGA, the use of FPGA-based accelerators has traditionally been limited to offloading long-running tasks from the CPU to programmable logic. Although modern heterogeneous platforms, such as Zynq and HARP, reduce the costs of CPU-FPGA data transfers, the traditional offload model is cemented as the popular choice. For these systems to become truly heterogeneous, the utilisation of all computational resources should be optimised. In particular, the CPU and FPGA should cooperate by dividing the workload between them so as to maximize system throughput. We first derive a model that predicts the optimum partitioning of a workload between hardware and software. We then measure the performance of short transfers between CPU and FPGA on the Zynq CPU-FPGA platform. Such transfers are essential to efficiently synchronise between cooperating hardware and software tasks. Finally, we demonstrate how our derived model can be used to choose the optimum workload partitioning to within 8% of the optimum for an accumulator task and predict its execution time within 12%.\",\"PeriodicalId\":434541,\"journal\":{\"name\":\"2018 International Conference on Field-Programmable Technology (FPT)\",\"volume\":\"81 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 International Conference on Field-Programmable Technology (FPT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/FPT.2018.00075\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 International Conference on Field-Programmable Technology (FPT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FPT.2018.00075","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

由于CPU和FPGA之间重复数据移动的成本，基于FPGA的加速器的使用传统上仅限于将长时间运行的任务从CPU卸载到可编程逻辑。尽管现代异构平台(如Zynq和HARP)降低了CPU-FPGA数据传输的成本，但传统的卸载模型仍然是流行的选择。为了使这些系统成为真正的异构系统，应该优化所有计算资源的利用。特别是，CPU和FPGA应该协同工作，在它们之间划分工作负载，以最大限度地提高系统吞吐量。我们首先推导了一个模型，该模型预测了硬件和软件之间工作负载的最佳分区。然后，我们在Zynq CPU-FPGA平台上测量CPU和FPGA之间的短传输性能。这种传输对于在协作的硬件和软件任务之间有效同步至关重要。最后，我们演示了如何使用我们导出的模型为累加器任务选择最优的工作负载分区，该分区在最优的8%以内，并预测其执行时间在12%以内。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Short-Transfer Model for Tightly-Coupled CPU-FPGA Platforms

Due to the cost of repeated data movement between CPU and FPGA, the use of FPGA-based accelerators has traditionally been limited to offloading long-running tasks from the CPU to programmable logic. Although modern heterogeneous platforms, such as Zynq and HARP, reduce the costs of CPU-FPGA data transfers, the traditional offload model is cemented as the popular choice. For these systems to become truly heterogeneous, the utilisation of all computational resources should be optimised. In particular, the CPU and FPGA should cooperate by dividing the workload between them so as to maximize system throughput. We first derive a model that predicts the optimum partitioning of a workload between hardware and software. We then measure the performance of short transfers between CPU and FPGA on the Zynq CPU-FPGA platform. Such transfers are essential to efficiently synchronise between cooperating hardware and software tasks. Finally, we demonstrate how our derived model can be used to choose the optimum workload partitioning to within 8% of the optimum for an accumulator task and predict its execution time within 12%.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2018 International Conference on Field-Programmable Technology (FPT)

自引率

0.00%

发文量