{"title":"CPU-FPGA紧密耦合平台的短传输模型","authors":"Alexander Kroh, O. Diessel","doi":"10.1109/FPT.2018.00075","DOIUrl":null,"url":null,"abstract":"Due to the cost of repeated data movement between CPU and FPGA, the use of FPGA-based accelerators has traditionally been limited to offloading long-running tasks from the CPU to programmable logic. Although modern heterogeneous platforms, such as Zynq and HARP, reduce the costs of CPU-FPGA data transfers, the traditional offload model is cemented as the popular choice. For these systems to become truly heterogeneous, the utilisation of all computational resources should be optimised. In particular, the CPU and FPGA should cooperate by dividing the workload between them so as to maximize system throughput. We first derive a model that predicts the optimum partitioning of a workload between hardware and software. We then measure the performance of short transfers between CPU and FPGA on the Zynq CPU-FPGA platform. Such transfers are essential to efficiently synchronise between cooperating hardware and software tasks. Finally, we demonstrate how our derived model can be used to choose the optimum workload partitioning to within 8% of the optimum for an accumulator task and predict its execution time within 12%.","PeriodicalId":434541,"journal":{"name":"2018 International Conference on Field-Programmable Technology (FPT)","volume":"81 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"A Short-Transfer Model for Tightly-Coupled CPU-FPGA Platforms\",\"authors\":\"Alexander Kroh, O. Diessel\",\"doi\":\"10.1109/FPT.2018.00075\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Due to the cost of repeated data movement between CPU and FPGA, the use of FPGA-based accelerators has traditionally been limited to offloading long-running tasks from the CPU to programmable logic. Although modern heterogeneous platforms, such as Zynq and HARP, reduce the costs of CPU-FPGA data transfers, the traditional offload model is cemented as the popular choice. For these systems to become truly heterogeneous, the utilisation of all computational resources should be optimised. In particular, the CPU and FPGA should cooperate by dividing the workload between them so as to maximize system throughput. We first derive a model that predicts the optimum partitioning of a workload between hardware and software. We then measure the performance of short transfers between CPU and FPGA on the Zynq CPU-FPGA platform. Such transfers are essential to efficiently synchronise between cooperating hardware and software tasks. Finally, we demonstrate how our derived model can be used to choose the optimum workload partitioning to within 8% of the optimum for an accumulator task and predict its execution time within 12%.\",\"PeriodicalId\":434541,\"journal\":{\"name\":\"2018 International Conference on Field-Programmable Technology (FPT)\",\"volume\":\"81 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 International Conference on Field-Programmable Technology (FPT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/FPT.2018.00075\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 International Conference on Field-Programmable Technology (FPT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FPT.2018.00075","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Short-Transfer Model for Tightly-Coupled CPU-FPGA Platforms
Due to the cost of repeated data movement between CPU and FPGA, the use of FPGA-based accelerators has traditionally been limited to offloading long-running tasks from the CPU to programmable logic. Although modern heterogeneous platforms, such as Zynq and HARP, reduce the costs of CPU-FPGA data transfers, the traditional offload model is cemented as the popular choice. For these systems to become truly heterogeneous, the utilisation of all computational resources should be optimised. In particular, the CPU and FPGA should cooperate by dividing the workload between them so as to maximize system throughput. We first derive a model that predicts the optimum partitioning of a workload between hardware and software. We then measure the performance of short transfers between CPU and FPGA on the Zynq CPU-FPGA platform. Such transfers are essential to efficiently synchronise between cooperating hardware and software tasks. Finally, we demonstrate how our derived model can be used to choose the optimum workload partitioning to within 8% of the optimum for an accumulator task and predict its execution time within 12%.