优化人工智能管道与SYCL和OpenVINO

International Workshop on OpenCL Pub Date : 2022-05-10 DOI:10.1145/3529538.3529561

Nico Galoppo

{"title":"优化人工智能管道与SYCL和OpenVINO","authors":"Nico Galoppo","doi":"10.1145/3529538.3529561","DOIUrl":null,"url":null,"abstract":"Sensor data processing pipelines that are a ”mix” of feature-engineered and deep learning based processing have become prevalent today. For example, sensor fusion of point cloud data with RGB image streams is common in autonomous mobile robots and self-driving technology. The state-of-the-art in computer vision for extracting semantic information from RGB data is using deep learning today, and great advancements have been made recently in LiDAR odometry based on deep learning [x]. At the same time, other processing components in ”mixed” pipelines still use feature-engineered approaches that are not relying on deep neural nets. Embedded compute platforms in robotics systems are inherently heterogeneous in nature, often with a variety of CPUs, (integrated) GPUs, VPUs, and so on. This means that there is a growing need to implement ”mixed” pipelines on heterogeneous platforms that include a variety of xPUs. We want such pipeline implementations to benefit from the latest advancements in data- and thread-parallel computation, as well as state-of-the-art in optimized inference of AI DNN models. SYCL and OpenVINO are two open, industry supported APIs that allow a developer to do so. It is not only important to optimize the individual components of the processing pipeline - it is at least as important to also optimize the data flow and minimize data copies. This provides a way to benefit from the efficiencies in inference runtime and compute graph optimizations provided by OpenVINO, in combination with the extensibility that SYCL brings in implementing custom or non-DNN components. Similarly, the use of compatible synchronization primitives allows the different runtimes to schedule work more efficiently on the hardware and avoid execution hiccups. In this talk, we will demonstrate the mechanisms and primitives provided by both SYCL and OpenVINO to optimize the dataflow between, and efficient execution of the workloads implemented in the respective APIs. We will provide an example and show the impact on the overall throughput and latency of the end-to-end processing pipeline. The audience will learn to recognize inefficiencies in their pipelines using profiling tools, and understand how to optimize those inefficiencies using an easy-to-follow optimization recipe. Finally, we will provide guidance to developers of inference engines other than OpenVINO on how to integrate similar interoperability features into their APIs, so that they too can offer optimized SYCL-enabled AI pipelines to their users.","PeriodicalId":73497,"journal":{"name":"International Workshop on OpenCL","volume":"36 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Optimize AI pipelines with SYCL and OpenVINO\",\"authors\":\"Nico Galoppo\",\"doi\":\"10.1145/3529538.3529561\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Sensor data processing pipelines that are a ”mix” of feature-engineered and deep learning based processing have become prevalent today. For example, sensor fusion of point cloud data with RGB image streams is common in autonomous mobile robots and self-driving technology. The state-of-the-art in computer vision for extracting semantic information from RGB data is using deep learning today, and great advancements have been made recently in LiDAR odometry based on deep learning [x]. At the same time, other processing components in ”mixed” pipelines still use feature-engineered approaches that are not relying on deep neural nets. Embedded compute platforms in robotics systems are inherently heterogeneous in nature, often with a variety of CPUs, (integrated) GPUs, VPUs, and so on. This means that there is a growing need to implement ”mixed” pipelines on heterogeneous platforms that include a variety of xPUs. We want such pipeline implementations to benefit from the latest advancements in data- and thread-parallel computation, as well as state-of-the-art in optimized inference of AI DNN models. SYCL and OpenVINO are two open, industry supported APIs that allow a developer to do so. It is not only important to optimize the individual components of the processing pipeline - it is at least as important to also optimize the data flow and minimize data copies. This provides a way to benefit from the efficiencies in inference runtime and compute graph optimizations provided by OpenVINO, in combination with the extensibility that SYCL brings in implementing custom or non-DNN components. Similarly, the use of compatible synchronization primitives allows the different runtimes to schedule work more efficiently on the hardware and avoid execution hiccups. In this talk, we will demonstrate the mechanisms and primitives provided by both SYCL and OpenVINO to optimize the dataflow between, and efficient execution of the workloads implemented in the respective APIs. We will provide an example and show the impact on the overall throughput and latency of the end-to-end processing pipeline. The audience will learn to recognize inefficiencies in their pipelines using profiling tools, and understand how to optimize those inefficiencies using an easy-to-follow optimization recipe. Finally, we will provide guidance to developers of inference engines other than OpenVINO on how to integrate similar interoperability features into their APIs, so that they too can offer optimized SYCL-enabled AI pipelines to their users.\",\"PeriodicalId\":73497,\"journal\":{\"name\":\"International Workshop on OpenCL\",\"volume\":\"36 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-05-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Workshop on OpenCL\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3529538.3529561\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Workshop on OpenCL","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3529538.3529561","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

传感器数据处理管道是基于特征工程和深度学习处理的“混合体”，如今已经变得非常普遍。例如，点云数据与RGB图像流的传感器融合在自主移动机器人和自动驾驶技术中很常见。从RGB数据中提取语义信息的计算机视觉领域的最新技术正在使用深度学习，最近基于深度学习的激光雷达里程计也取得了很大进展[x]。与此同时，“混合”管道中的其他处理组件仍然使用特征工程方法，而不依赖于深度神经网络。机器人系统中的嵌入式计算平台本质上是异构的，通常具有各种cpu、(集成)gpu、vpu等。这意味着在包含各种xpu的异构平台上实现“混合”管道的需求越来越大。我们希望这样的管道实现受益于数据和线程并行计算的最新进展，以及人工智能深度神经网络模型优化推理的最新进展。SYCL和OpenVINO是两个开放的、行业支持的api，允许开发人员这样做。不仅优化处理管道的各个组件很重要，优化数据流和最小化数据副本也同样重要。这提供了一种从OpenVINO提供的推理运行时和计算图优化的效率中获益的方法，并结合SYCL在实现自定义或非dnn组件时带来的可扩展性。类似地，使用兼容的同步原语允许不同的运行时更有效地在硬件上调度工作，并避免执行中断。在这次演讲中，我们将展示SYCL和OpenVINO提供的机制和原语，以优化各自api中实现的工作负载之间的数据流和有效执行。我们将提供一个示例，并展示对端到端处理管道的总体吞吐量和延迟的影响。听众将学会使用分析工具识别管道中的低效率，并了解如何使用易于遵循的优化配方来优化这些低效率。最后，我们将为OpenVINO以外的推理引擎开发人员提供指导，告诉他们如何将类似的互操作性特性集成到他们的api中，以便他们也可以为用户提供优化的支持sycl的AI管道。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Optimize AI pipelines with SYCL and OpenVINO

Sensor data processing pipelines that are a ”mix” of feature-engineered and deep learning based processing have become prevalent today. For example, sensor fusion of point cloud data with RGB image streams is common in autonomous mobile robots and self-driving technology. The state-of-the-art in computer vision for extracting semantic information from RGB data is using deep learning today, and great advancements have been made recently in LiDAR odometry based on deep learning [x]. At the same time, other processing components in ”mixed” pipelines still use feature-engineered approaches that are not relying on deep neural nets. Embedded compute platforms in robotics systems are inherently heterogeneous in nature, often with a variety of CPUs, (integrated) GPUs, VPUs, and so on. This means that there is a growing need to implement ”mixed” pipelines on heterogeneous platforms that include a variety of xPUs. We want such pipeline implementations to benefit from the latest advancements in data- and thread-parallel computation, as well as state-of-the-art in optimized inference of AI DNN models. SYCL and OpenVINO are two open, industry supported APIs that allow a developer to do so. It is not only important to optimize the individual components of the processing pipeline - it is at least as important to also optimize the data flow and minimize data copies. This provides a way to benefit from the efficiencies in inference runtime and compute graph optimizations provided by OpenVINO, in combination with the extensibility that SYCL brings in implementing custom or non-DNN components. Similarly, the use of compatible synchronization primitives allows the different runtimes to schedule work more efficiently on the hardware and avoid execution hiccups. In this talk, we will demonstrate the mechanisms and primitives provided by both SYCL and OpenVINO to optimize the dataflow between, and efficient execution of the workloads implemented in the respective APIs. We will provide an example and show the impact on the overall throughput and latency of the end-to-end processing pipeline. The audience will learn to recognize inefficiencies in their pipelines using profiling tools, and understand how to optimize those inefficiencies using an easy-to-follow optimization recipe. Finally, we will provide guidance to developers of inference engines other than OpenVINO on how to integrate similar interoperability features into their APIs, so that they too can offer optimized SYCL-enabled AI pipelines to their users.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Workshop on OpenCL

自引率

0.00%

发文量