Orchestration of co-operative and adaptive multi-core deep learning engines

Autonomous Vehicles and Machines Pub Date : 2023-01-16 DOI:10.2352/ei.2023.35.16.avm-112

Mihir Mody, Kumar Desappan, P. Swami, David Smith, Shyam Jagannathan, Kevin Lavery, Gregory Shultz, Jason Jones

{"title":"Orchestration of co-operative and adaptive multi-core deep learning engines","authors":"Mihir Mody, Kumar Desappan, P. Swami, David Smith, Shyam Jagannathan, Kevin Lavery, Gregory Shultz, Jason Jones","doi":"10.2352/ei.2023.35.16.avm-112","DOIUrl":null,"url":null,"abstract":"Automated driving functions, like highway driving and parking assist, are increasingly getting deployed in high-end cars with the goal of realizing self-driving car using Deep learning (DL) techniques like convolution neural network (CNN), Transformers etc. Deep learning (DL)-based algorithms are used in many integral modules of Advanced driver Assistance systems (ADAS) and Automated Driving Systems. Camera based perception, Driver Monitoring, Driving Policy, Radar and Lidar perception are few of the examples built using DL algorithms in such systems. These real-time DL applications requires huge compute requires up to 250 TOPs to realize them on an edge device. To meet the needs of such SoCs efficiently in-terms of Cost and Power silicon vendor provide a complex SoC with multiple DL engines. These SoCs also comes with all the system resources like L2/L3 on-chip memory, high speed DDR interface, PMIC etc to feed the data and power to utilize these DL engines compute efficiently. These system resource would scale linearly with number of DL engines in the system. This paper proposes solutions to optimizes these system resource to provide cost and Power efficient solution. (1) Co-operative and Adaptive asynchronous DL engines scheduling to optimize the peak resources usage in multiple vectors like memory size, throughput, Power/ Current. (2) Orchestration of Co-operative and Adaptive Multi-core DL Engines to achieve synchronous execution to achieve maximum utilization of all the resources.","PeriodicalId":177462,"journal":{"name":"Autonomous Vehicles and Machines","volume":"111 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Autonomous Vehicles and Machines","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2352/ei.2023.35.16.avm-112","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Automated driving functions, like highway driving and parking assist, are increasingly getting deployed in high-end cars with the goal of realizing self-driving car using Deep learning (DL) techniques like convolution neural network (CNN), Transformers etc. Deep learning (DL)-based algorithms are used in many integral modules of Advanced driver Assistance systems (ADAS) and Automated Driving Systems. Camera based perception, Driver Monitoring, Driving Policy, Radar and Lidar perception are few of the examples built using DL algorithms in such systems. These real-time DL applications requires huge compute requires up to 250 TOPs to realize them on an edge device. To meet the needs of such SoCs efficiently in-terms of Cost and Power silicon vendor provide a complex SoC with multiple DL engines. These SoCs also comes with all the system resources like L2/L3 on-chip memory, high speed DDR interface, PMIC etc to feed the data and power to utilize these DL engines compute efficiently. These system resource would scale linearly with number of DL engines in the system. This paper proposes solutions to optimizes these system resource to provide cost and Power efficient solution. (1) Co-operative and Adaptive asynchronous DL engines scheduling to optimize the peak resources usage in multiple vectors like memory size, throughput, Power/ Current. (2) Orchestration of Co-operative and Adaptive Multi-core DL Engines to achieve synchronous execution to achieve maximum utilization of all the resources.

查看原文本刊更多论文

协作和自适应多核深度学习引擎的编排

自动驾驶功能，如高速公路驾驶和停车辅助，越来越多地部署在高端汽车上，目标是利用卷积神经网络(CNN)、变形金刚等深度学习(DL)技术实现自动驾驶汽车。基于深度学习(DL)的算法被用于高级驾驶辅助系统(ADAS)和自动驾驶系统的许多集成模块中。基于摄像头的感知、驾驶员监控、驾驶策略、雷达和激光雷达感知是在此类系统中使用深度学习算法构建的几个例子。这些实时深度学习应用需要巨大的计算能力，需要多达250个TOPs才能在边缘设备上实现。为了有效地满足这种SoC在成本和功耗方面的需求，硅供应商提供了具有多个DL引擎的复杂SoC。这些soc还配备了所有系统资源，如L2/L3片上存储器，高速DDR接口，PMIC等，以提供数据和功率，从而有效地利用这些DL引擎进行计算。这些系统资源将随系统中深度学习引擎的数量线性扩展。本文提出了优化这些系统资源的解决方案，以提供低成本和低功耗的解决方案。(1)协作和自适应异步DL引擎调度，以优化内存大小，吞吐量，功率/电流等多个向量的峰值资源使用。(2)编排协同和自适应多核DL引擎，实现同步执行，实现所有资源的最大利用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Autonomous Vehicles and Machines

自引率

0.00%

发文量