决定:使用OpenCL在cpu、gpu和fpga上分发OpenVX应用程序

2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI:10.1109/IPDPSW55747.2022.00023

Lester Kalms, Tim Haering, Diana Göhringer

{"title":"决定:使用OpenCL在cpu、gpu和fpga上分发OpenVX应用程序","authors":"Lester Kalms, Tim Haering, Diana Göhringer","doi":"10.1109/IPDPSW55747.2022.00023","DOIUrl":null,"url":null,"abstract":"The demand for computer vision systems and algorithms is steadily increasing. However, users often have to deal with different or new languages, architectures and tools. Furthermore, there is often no linkage between vendors, defined standards, or model-based modularization to connect everything. We propose a modularized framework for distributing applications on heterogeneous systems consisting of CPUs, GPUs, and FPGAs. The user builds an OpenVX-compliant application without knowledge of the underlying hardware. The middleend automatically schedules and maps the nodes to the available OpenCL devices. Benefits of FPGA acceleration, such as pipelining and running multiple nodes in parallel, are taken into account. The backend generates a program including memory management, synchronization mechanisms and data transfers, even between vendors. This is executed in our parallelized OpenCL based runtime system with minimal overhead. We achieved speedups of 1.63 for a heterogeneous schedule in comparison to a single GPU design when limiting the FPGA resources. Without this limitation a speedup of 13.39 is achieved for the same application.","PeriodicalId":286968,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"104 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"DECISION: Distributing OpenVX Applications on CPUs, GPUs and FPGAs using OpenCL\",\"authors\":\"Lester Kalms, Tim Haering, Diana Göhringer\",\"doi\":\"10.1109/IPDPSW55747.2022.00023\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The demand for computer vision systems and algorithms is steadily increasing. However, users often have to deal with different or new languages, architectures and tools. Furthermore, there is often no linkage between vendors, defined standards, or model-based modularization to connect everything. We propose a modularized framework for distributing applications on heterogeneous systems consisting of CPUs, GPUs, and FPGAs. The user builds an OpenVX-compliant application without knowledge of the underlying hardware. The middleend automatically schedules and maps the nodes to the available OpenCL devices. Benefits of FPGA acceleration, such as pipelining and running multiple nodes in parallel, are taken into account. The backend generates a program including memory management, synchronization mechanisms and data transfers, even between vendors. This is executed in our parallelized OpenCL based runtime system with minimal overhead. We achieved speedups of 1.63 for a heterogeneous schedule in comparison to a single GPU design when limiting the FPGA resources. Without this limitation a speedup of 13.39 is achieved for the same application.\",\"PeriodicalId\":286968,\"journal\":{\"name\":\"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)\",\"volume\":\"104 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IPDPSW55747.2022.00023\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPSW55747.2022.00023","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

对计算机视觉系统和算法的需求正在稳步增长。然而，用户经常必须处理不同的或新的语言、体系结构和工具。此外，在供应商、定义的标准或基于模型的模块化之间通常没有联系来连接一切。我们提出了一个模块化框架，用于在由cpu、gpu和fpga组成的异构系统上分发应用程序。用户在不了解底层硬件的情况下构建一个兼容openvx的应用程序。中间端自动调度节点并将其映射到可用的OpenCL设备。FPGA加速的好处，如流水线和并行运行多个节点，被考虑在内。后端生成一个程序，包括内存管理、同步机制和数据传输，甚至在供应商之间。这是在我们基于并行OpenCL的运行时系统中以最小的开销执行的。在限制FPGA资源的情况下，与单一GPU设计相比，我们在异构调度下实现了1.63的加速。如果没有这个限制，同样的应用程序可以获得13.39的加速。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

DECISION: Distributing OpenVX Applications on CPUs, GPUs and FPGAs using OpenCL

The demand for computer vision systems and algorithms is steadily increasing. However, users often have to deal with different or new languages, architectures and tools. Furthermore, there is often no linkage between vendors, defined standards, or model-based modularization to connect everything. We propose a modularized framework for distributing applications on heterogeneous systems consisting of CPUs, GPUs, and FPGAs. The user builds an OpenVX-compliant application without knowledge of the underlying hardware. The middleend automatically schedules and maps the nodes to the available OpenCL devices. Benefits of FPGA acceleration, such as pipelining and running multiple nodes in parallel, are taken into account. The backend generates a program including memory management, synchronization mechanisms and data transfers, even between vendors. This is executed in our parallelized OpenCL based runtime system with minimal overhead. We achieved speedups of 1.63 for a heterogeneous schedule in comparison to a single GPU design when limiting the FPGA resources. Without this limitation a speedup of 13.39 is achieved for the same application.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

自引率

0.00%

发文量