ARM-CO-UP: ARM COoperative Utilization of Processors

IF 2.2 4区计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

ACM Transactions on Design Automation of Electronic Systems Pub Date : 2024-04-08 DOI:10.1145/3656472

Ehsan Aghapour, Dolly Sapra, Andy Pimentel, Anuj Pathania

{"title":"ARM-CO-UP: ARM COoperative Utilization of Processors","authors":"Ehsan Aghapour, Dolly Sapra, Andy Pimentel, Anuj Pathania","doi":"10.1145/3656472","DOIUrl":null,"url":null,"abstract":"HMPSoCs combine different processors on a single chip. They enable powerful embedded devices, which increasingly perform ML inference tasks at the edge. State-of-the-art HMPSoCs can perform on-chip embedded inference using different processors, such as CPUs, GPUs, and NPUs. HMPSoCs can potentially overcome the limitation of low single-processor CNN inference performance and efficiency by cooperative use of multiple processors. However, standard inference frameworks for edge devices typically utilize only a single processor. We present the ARM-CO-UP framework built on the ARM-CL library. The ARM-CO-UP framework supports two modes of operation – Pipeline and Switch. It optimizes inference throughput using pipelined execution of network partitions for consecutive input frames in the Pipeline mode. It improves inference latency through layer-switched inference for a single input frame in the Switch mode. Furthermore, it supports layer-wise CPU/GPU DVFS in both modes for improving power efficiency and energy consumption. ARM-CO-UP is a comprehensive framework for multi-processor CNN inference that automates CNN partitioning and mapping, pipeline synchronization, processor type switching, layer-wise DVFS, and closed-source NPU integration.","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":null,"pages":null},"PeriodicalIF":2.2000,"publicationDate":"2024-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Design Automation of Electronic Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3656472","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

Abstract

HMPSoCs combine different processors on a single chip. They enable powerful embedded devices, which increasingly perform ML inference tasks at the edge. State-of-the-art HMPSoCs can perform on-chip embedded inference using different processors, such as CPUs, GPUs, and NPUs. HMPSoCs can potentially overcome the limitation of low single-processor CNN inference performance and efficiency by cooperative use of multiple processors. However, standard inference frameworks for edge devices typically utilize only a single processor.

We present the ARM-CO-UP framework built on the ARM-CL library. The ARM-CO-UP framework supports two modes of operation – Pipeline and Switch. It optimizes inference throughput using pipelined execution of network partitions for consecutive input frames in the Pipeline mode. It improves inference latency through layer-switched inference for a single input frame in the Switch mode. Furthermore, it supports layer-wise CPU/GPU DVFS in both modes for improving power efficiency and energy consumption. ARM-CO-UP is a comprehensive framework for multi-processor CNN inference that automates CNN partitioning and mapping, pipeline synchronization, processor type switching, layer-wise DVFS, and closed-source NPU integration.

查看原文本刊更多论文

ARM-CO-UP：ARM 处理器的协同利用

HMPSoC 在单个芯片上集成了不同的处理器。它们支持功能强大的嵌入式设备，这些设备越来越多地在边缘执行 ML 推断任务。最先进的 HMPSoC 可以使用不同的处理器（如 CPU、GPU 和 NPU）执行片上嵌入式推理。HMPSoC 可以通过合作使用多个处理器来克服单处理器 CNN 推理性能和效率低的限制。然而，用于边缘设备的标准推理框架通常只使用单个处理器。我们介绍了基于 ARM-CL 库的 ARM-CO-UP 框架。ARM-CO-UP 框架支持两种运行模式--管道和交换。在流水线模式下，它通过流水线执行连续输入帧的网络分区来优化推理吞吐量。在开关模式下，它通过对单个输入帧进行层切换推理来改善推理延迟。此外，它还支持这两种模式下的 CPU/GPU DVFS 分层，以提高能效和能耗。ARM-CO-UP 是用于多处理器 CNN 推理的综合框架，可自动进行 CNN 分区和映射、流水线同步、处理器类型切换、分层 DVFS 和闭源 NPU 集成。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ACM Transactions on Design Automation of Electronic Systems 工程技术-计算机：软件工程

CiteScore

3.20

自引率

7.10%

发文量

105

审稿时长

3 months

期刊介绍： TODAES is a premier ACM journal in design and automation of electronic systems. It publishes innovative work documenting significant research and development advances on the specification, design, analysis, simulation, testing, and evaluation of electronic systems, emphasizing a computer science/engineering orientation. Both theoretical analysis and practical solutions are welcome.