ARM-CO-UP: ARM COoperative Utilization of Processors

IF 2.2 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE
Ehsan Aghapour, Dolly Sapra, Andy Pimentel, Anuj Pathania
{"title":"ARM-CO-UP: ARM COoperative Utilization of Processors","authors":"Ehsan Aghapour, Dolly Sapra, Andy Pimentel, Anuj Pathania","doi":"10.1145/3656472","DOIUrl":null,"url":null,"abstract":"<p>HMPSoCs combine different processors on a single chip. They enable powerful embedded devices, which increasingly perform ML inference tasks at the edge. State-of-the-art HMPSoCs can perform on-chip embedded inference using different processors, such as CPUs, GPUs, and NPUs. HMPSoCs can potentially overcome the limitation of low single-processor CNN inference performance and efficiency by cooperative use of multiple processors. However, standard inference frameworks for edge devices typically utilize only a single processor. </p><p>We present the <i>ARM-CO-UP</i> framework built on the <i>ARM-CL</i> library. The <i>ARM-CO-UP</i> framework supports two modes of operation – Pipeline and Switch. It optimizes inference throughput using pipelined execution of network partitions for consecutive input frames in the Pipeline mode. It improves inference latency through layer-switched inference for a single input frame in the Switch mode. Furthermore, it supports layer-wise CPU/GPU <i>DVFS</i> in both modes for improving power efficiency and energy consumption. <i>ARM-CO-UP</i> is a comprehensive framework for multi-processor CNN inference that automates CNN partitioning and mapping, pipeline synchronization, processor type switching, layer-wise <i>DVFS</i>, and closed-source NPU integration.</p>","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":null,"pages":null},"PeriodicalIF":2.2000,"publicationDate":"2024-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Design Automation of Electronic Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3656472","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0

Abstract

HMPSoCs combine different processors on a single chip. They enable powerful embedded devices, which increasingly perform ML inference tasks at the edge. State-of-the-art HMPSoCs can perform on-chip embedded inference using different processors, such as CPUs, GPUs, and NPUs. HMPSoCs can potentially overcome the limitation of low single-processor CNN inference performance and efficiency by cooperative use of multiple processors. However, standard inference frameworks for edge devices typically utilize only a single processor.

We present the ARM-CO-UP framework built on the ARM-CL library. The ARM-CO-UP framework supports two modes of operation – Pipeline and Switch. It optimizes inference throughput using pipelined execution of network partitions for consecutive input frames in the Pipeline mode. It improves inference latency through layer-switched inference for a single input frame in the Switch mode. Furthermore, it supports layer-wise CPU/GPU DVFS in both modes for improving power efficiency and energy consumption. ARM-CO-UP is a comprehensive framework for multi-processor CNN inference that automates CNN partitioning and mapping, pipeline synchronization, processor type switching, layer-wise DVFS, and closed-source NPU integration.

ARM-CO-UP:ARM 处理器的协同利用
HMPSoC 在单个芯片上集成了不同的处理器。它们支持功能强大的嵌入式设备,这些设备越来越多地在边缘执行 ML 推断任务。最先进的 HMPSoC 可以使用不同的处理器(如 CPU、GPU 和 NPU)执行片上嵌入式推理。HMPSoC 可以通过合作使用多个处理器来克服单处理器 CNN 推理性能和效率低的限制。然而,用于边缘设备的标准推理框架通常只使用单个处理器。我们介绍了基于 ARM-CL 库的 ARM-CO-UP 框架。ARM-CO-UP 框架支持两种运行模式--管道和交换。在流水线模式下,它通过流水线执行连续输入帧的网络分区来优化推理吞吐量。在开关模式下,它通过对单个输入帧进行层切换推理来改善推理延迟。此外,它还支持这两种模式下的 CPU/GPU DVFS 分层,以提高能效和能耗。ARM-CO-UP 是用于多处理器 CNN 推理的综合框架,可自动进行 CNN 分区和映射、流水线同步、处理器类型切换、分层 DVFS 和闭源 NPU 集成。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
ACM Transactions on Design Automation of Electronic Systems
ACM Transactions on Design Automation of Electronic Systems 工程技术-计算机:软件工程
CiteScore
3.20
自引率
7.10%
发文量
105
审稿时长
3 months
期刊介绍: TODAES is a premier ACM journal in design and automation of electronic systems. It publishes innovative work documenting significant research and development advances on the specification, design, analysis, simulation, testing, and evaluation of electronic systems, emphasizing a computer science/engineering orientation. Both theoretical analysis and practical solutions are welcome.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信