Arch2End: Two-Stage Unified System-Level Modeling for Heterogeneous Intelligent Devices

IF 2.7 3区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Pub Date : 2024-11-06 DOI:10.1109/TCAD.2024.3443706

Weihong Liu;Zongwei Zhu;Boyu Li;Yi Xiong;Zirui Lian;Jiawei Geng;Xuehai Zhou

{"title":"Arch2End: Two-Stage Unified System-Level Modeling for Heterogeneous Intelligent Devices","authors":"Weihong Liu;Zongwei Zhu;Boyu Li;Yi Xiong;Zirui Lian;Jiawei Geng;Xuehai Zhou","doi":"10.1109/TCAD.2024.3443706","DOIUrl":null,"url":null,"abstract":"The surge in intelligent edge computing has propelled the adoption and expansion of the distributed embedded systems (DESs). Numerous scheduling strategies are introduced to improve the DES throughput, such as latency-aware and group-based hierarchical scheduling. Effective device modeling can help in modular and plug-in scheduler design. For uniformity in scheduling interfaces, an unified device performance modeling is adopted, typically involving the system-level modeling that incorporates both the hardware and software stacks, broadly divided into two categories. Fine-grained modeling methods based on the hardware architecture analysis become very difficult when dealing with a large number of heterogeneous devices, mainly because much architecture information is closed-source and costly to analyse. Coarse-grained methods are based on the limited architecture information or benchmark models, resulting in insufficient generalization in the complex inference performance of diverse deep neural networks (DNNs). Therefore, we introduce a two-stage system-level modeling method (Arch2End), combining limited architecture information with scalable benchmark models to achieve an unified performance representation. Stage one leverages public information to analyse architectures in an uniform abstraction and to design the benchmark models for exploring the device performance boundaries, ensuring uniformity. Stage two extracts critical device features from the end-to-end inference metrics of extensive simulation models, ensuring universality and enhancing characterization capacity. Compared to the state-of-the-art methods, Arch2End achieves the lowest DNN latency prediction relative errors in the NAS-Bench-201 (1.7%) and real-world DNNs (8.2%). It also showcases superior performance in intergroup balanced device grouping strategies.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"43 11","pages":"4154-4165"},"PeriodicalIF":2.7000,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10745851/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

Abstract

The surge in intelligent edge computing has propelled the adoption and expansion of the distributed embedded systems (DESs). Numerous scheduling strategies are introduced to improve the DES throughput, such as latency-aware and group-based hierarchical scheduling. Effective device modeling can help in modular and plug-in scheduler design. For uniformity in scheduling interfaces, an unified device performance modeling is adopted, typically involving the system-level modeling that incorporates both the hardware and software stacks, broadly divided into two categories. Fine-grained modeling methods based on the hardware architecture analysis become very difficult when dealing with a large number of heterogeneous devices, mainly because much architecture information is closed-source and costly to analyse. Coarse-grained methods are based on the limited architecture information or benchmark models, resulting in insufficient generalization in the complex inference performance of diverse deep neural networks (DNNs). Therefore, we introduce a two-stage system-level modeling method (Arch2End), combining limited architecture information with scalable benchmark models to achieve an unified performance representation. Stage one leverages public information to analyse architectures in an uniform abstraction and to design the benchmark models for exploring the device performance boundaries, ensuring uniformity. Stage two extracts critical device features from the end-to-end inference metrics of extensive simulation models, ensuring universality and enhancing characterization capacity. Compared to the state-of-the-art methods, Arch2End achieves the lowest DNN latency prediction relative errors in the NAS-Bench-201 (1.7%) and real-world DNNs (8.2%). It also showcases superior performance in intergroup balanced device grouping strategies.

查看原文本刊更多论文

Arch2End：针对异构智能设备的两阶段统一系统级建模

智能边缘计算的迅猛发展推动了分布式嵌入式系统（DES）的采用和扩展。为提高分布式嵌入式系统的吞吐量，引入了大量调度策略，如延迟感知和基于组的分层调度。有效的设备建模有助于模块化和插件式调度器的设计。为实现调度接口的统一性，采用了统一的设备性能建模，通常涉及系统级建模，包括硬件和软件堆栈，大致分为两类。在处理大量异构设备时，基于硬件架构分析的细粒度建模方法变得非常困难，这主要是因为许多架构信息都是闭源的，分析成本很高。粗粒度方法基于有限的架构信息或基准模型，导致不同深度神经网络（DNN）的复杂推理性能泛化不足。因此，我们引入了一种分两个阶段的系统级建模方法（Arch2End），将有限的架构信息与可扩展的基准模型相结合，以实现统一的性能表示。第一阶段利用公共信息以统一抽象的方式分析架构，并设计用于探索设备性能边界的基准模型，以确保统一性。第二阶段从大量仿真模型的端到端推理指标中提取关键器件特征，确保通用性并提高表征能力。与最先进的方法相比，Arch2End 在 NAS-Bench-201 (1.7%) 和真实世界 DNN (8.2%) 中实现了最低的 DNN 延迟预测相对误差。此外，Arch2End 还在组间平衡设备分组策略方面表现出色。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 工程技术-工程：电子与电气

CiteScore

5.60

自引率

13.80%

发文量

500

审稿时长

7 months

期刊介绍： The purpose of this Transactions is to publish papers of interest to individuals in the area of computer-aided design of integrated circuits and systems composed of analog, digital, mixed-signal, optical, or microwave components. The aids include methods, models, algorithms, and man-machine interfaces for system-level, physical and logical design including: planning, synthesis, partitioning, modeling, simulation, layout, verification, testing, hardware-software co-design and documentation of integrated circuit and system designs of all complexities. Design tools and techniques for evaluating and designing integrated circuits and systems for metrics such as performance, power, reliability, testability, and security are a focus.