{"title":"决定:使用OpenCL在cpu、gpu和fpga上分发OpenVX应用程序","authors":"Lester Kalms, Tim Haering, Diana Göhringer","doi":"10.1109/IPDPSW55747.2022.00023","DOIUrl":null,"url":null,"abstract":"The demand for computer vision systems and algorithms is steadily increasing. However, users often have to deal with different or new languages, architectures and tools. Furthermore, there is often no linkage between vendors, defined standards, or model-based modularization to connect everything. We propose a modularized framework for distributing applications on heterogeneous systems consisting of CPUs, GPUs, and FPGAs. The user builds an OpenVX-compliant application without knowledge of the underlying hardware. The middleend automatically schedules and maps the nodes to the available OpenCL devices. Benefits of FPGA acceleration, such as pipelining and running multiple nodes in parallel, are taken into account. The backend generates a program including memory management, synchronization mechanisms and data transfers, even between vendors. This is executed in our parallelized OpenCL based runtime system with minimal overhead. We achieved speedups of 1.63 for a heterogeneous schedule in comparison to a single GPU design when limiting the FPGA resources. Without this limitation a speedup of 13.39 is achieved for the same application.","PeriodicalId":286968,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"104 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"DECISION: Distributing OpenVX Applications on CPUs, GPUs and FPGAs using OpenCL\",\"authors\":\"Lester Kalms, Tim Haering, Diana Göhringer\",\"doi\":\"10.1109/IPDPSW55747.2022.00023\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The demand for computer vision systems and algorithms is steadily increasing. However, users often have to deal with different or new languages, architectures and tools. Furthermore, there is often no linkage between vendors, defined standards, or model-based modularization to connect everything. We propose a modularized framework for distributing applications on heterogeneous systems consisting of CPUs, GPUs, and FPGAs. The user builds an OpenVX-compliant application without knowledge of the underlying hardware. The middleend automatically schedules and maps the nodes to the available OpenCL devices. Benefits of FPGA acceleration, such as pipelining and running multiple nodes in parallel, are taken into account. The backend generates a program including memory management, synchronization mechanisms and data transfers, even between vendors. This is executed in our parallelized OpenCL based runtime system with minimal overhead. We achieved speedups of 1.63 for a heterogeneous schedule in comparison to a single GPU design when limiting the FPGA resources. Without this limitation a speedup of 13.39 is achieved for the same application.\",\"PeriodicalId\":286968,\"journal\":{\"name\":\"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)\",\"volume\":\"104 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IPDPSW55747.2022.00023\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPSW55747.2022.00023","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
DECISION: Distributing OpenVX Applications on CPUs, GPUs and FPGAs using OpenCL
The demand for computer vision systems and algorithms is steadily increasing. However, users often have to deal with different or new languages, architectures and tools. Furthermore, there is often no linkage between vendors, defined standards, or model-based modularization to connect everything. We propose a modularized framework for distributing applications on heterogeneous systems consisting of CPUs, GPUs, and FPGAs. The user builds an OpenVX-compliant application without knowledge of the underlying hardware. The middleend automatically schedules and maps the nodes to the available OpenCL devices. Benefits of FPGA acceleration, such as pipelining and running multiple nodes in parallel, are taken into account. The backend generates a program including memory management, synchronization mechanisms and data transfers, even between vendors. This is executed in our parallelized OpenCL based runtime system with minimal overhead. We achieved speedups of 1.63 for a heterogeneous schedule in comparison to a single GPU design when limiting the FPGA resources. Without this limitation a speedup of 13.39 is achieved for the same application.