Real-Time Scheduling of Machine Learning Operations on Heterogeneous Neuromorphic SoC

2022 20th ACM-IEEE International Conference on Formal Methods and Models for System Design (MEMOCODE) Pub Date : 2022-09-29 DOI:10.1109/MEMOCODE57689.2022.9954596

Anup Das

{"title":"Real-Time Scheduling of Machine Learning Operations on Heterogeneous Neuromorphic SoC","authors":"Anup Das","doi":"10.1109/MEMOCODE57689.2022.9954596","DOIUrl":null,"url":null,"abstract":"Neuromorphic Systems-on-Chip (NSoCs) are becoming heterogeneous by integrating general-purpose processors (GPPs) and neural processing units (NPUs) on the same SoC. For embedded systems, an NSoC may need to execute user applications built using a variety of machine learning models. We propose a real-time scheduler, called PRISM, which can schedule machine learning models on a heterogeneous NSoC either individually or concurrently to improve their system performance. PRISM consists of the following four key steps. First, it constructs an interprocessor communication (IPC) graph of a machine learning model from a mapping and a self-timed schedule. Second, it creates a transaction order for the communication actors and embeds this order into the IPC graph. Third, it schedules the graph on an NSoC by overlapping communication with the computation. Finally, it uses a Hill Climbing heuristic to explore the design space of mapping operations on GPPs and NPUs to improve the performance. Unlike existing schedulers which use only the NPUs of an NSoC, PRISM improves performance by enabling batch, pipeline, and operation parallelism via exploiting a platform's heterogeneity. For use-cases with concurrent applications, PRISM uses a heuristic resource sharing strategy and a non-preemptive scheduling to reduce the expected wait time before concurrent operations can be scheduled on contending resources. Our extensive evaluations with 20 machine learning workloads show that PRISM significantly improves the performance per watt for both individual applications and use-cases when compared to state-of-the-art schedulers.","PeriodicalId":157326,"journal":{"name":"2022 20th ACM-IEEE International Conference on Formal Methods and Models for System Design (MEMOCODE)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 20th ACM-IEEE International Conference on Formal Methods and Models for System Design (MEMOCODE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MEMOCODE57689.2022.9954596","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

Neuromorphic Systems-on-Chip (NSoCs) are becoming heterogeneous by integrating general-purpose processors (GPPs) and neural processing units (NPUs) on the same SoC. For embedded systems, an NSoC may need to execute user applications built using a variety of machine learning models. We propose a real-time scheduler, called PRISM, which can schedule machine learning models on a heterogeneous NSoC either individually or concurrently to improve their system performance. PRISM consists of the following four key steps. First, it constructs an interprocessor communication (IPC) graph of a machine learning model from a mapping and a self-timed schedule. Second, it creates a transaction order for the communication actors and embeds this order into the IPC graph. Third, it schedules the graph on an NSoC by overlapping communication with the computation. Finally, it uses a Hill Climbing heuristic to explore the design space of mapping operations on GPPs and NPUs to improve the performance. Unlike existing schedulers which use only the NPUs of an NSoC, PRISM improves performance by enabling batch, pipeline, and operation parallelism via exploiting a platform's heterogeneity. For use-cases with concurrent applications, PRISM uses a heuristic resource sharing strategy and a non-preemptive scheduling to reduce the expected wait time before concurrent operations can be scheduled on contending resources. Our extensive evaluations with 20 machine learning workloads show that PRISM significantly improves the performance per watt for both individual applications and use-cases when compared to state-of-the-art schedulers.

查看原文本刊更多论文

异构神经形态SoC上机器学习操作的实时调度

通过将通用处理器(gpp)和神经处理单元(npu)集成在同一个SoC上，神经形态片上系统(nsoc)正变得异构化。对于嵌入式系统，NSoC可能需要执行使用各种机器学习模型构建的用户应用程序。我们提出了一个实时调度程序，称为PRISM，它可以单独或并发地调度异构NSoC上的机器学习模型，以提高其系统性能。PRISM包括以下四个关键步骤。首先，从映射和自定时调度中构建机器学习模型的处理器间通信(IPC)图。其次，它为通信参与者创建一个事务顺序，并将此顺序嵌入IPC图中。第三，它通过与计算的重叠通信来调度NSoC上的图。最后，利用爬坡启发式方法探索gpp和npu上映射操作的设计空间，以提高性能。与仅使用NSoC的npu的现有调度器不同，PRISM通过利用平台的异构性来实现批处理、管道和操作并行性，从而提高了性能。对于并发应用程序的用例，PRISM使用启发式资源共享策略和非抢占式调度，以减少在竞争资源上调度并发操作之前的预期等待时间。我们对20个机器学习工作负载的广泛评估表明，与最先进的调度器相比，PRISM显著提高了单个应用程序和用例的每瓦性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 20th ACM-IEEE International Conference on Formal Methods and Models for System Design (MEMOCODE)

自引率

0.00%

发文量