CARIn: Constraint-Aware and Responsive Inference on Heterogeneous Devices for Single- and Multi-DNN Workloads

IF 2.8 3区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

ACM Transactions on Embedded Computing Systems Pub Date : 2024-05-23 DOI:10.1145/3665868

Ioannis Panopoulos, Stylianos I. Venieris, I. Venieris

{"title":"CARIn: Constraint-Aware and Responsive Inference on Heterogeneous Devices for Single- and Multi-DNN Workloads","authors":"Ioannis Panopoulos, Stylianos I. Venieris, I. Venieris","doi":"10.1145/3665868","DOIUrl":null,"url":null,"abstract":"\n The relentless expansion of deep learning (DL) applications in recent years has prompted a pivotal shift towards on-device execution, driven by the urgent need for real-time processing, heightened privacy concerns, and reduced latency across diverse domains. This paper addresses the challenges inherent in optimising the execution of deep neural networks (DNNs) on mobile devices, with a focus on device heterogeneity, multi-DNN execution, and dynamic runtime adaptation. We introduce\n CARIn\n , a novel framework designed for the optimised deployment of both single- and multi-DNN applications under user-defined service-level objectives (SLOs). Leveraging an expressive multi-objective optimisation (MOO) framework and a runtime-aware sorting and search algorithm (\n RASS\n ) as the MOO solver,\n CARIn\n facilitates efficient adaptation to dynamic conditions while addressing resource contention issues associated with multi-DNN execution. Notably,\n RASS\n generates a set of configurations, anticipating subsequent runtime adaptation, ensuring rapid, low-overhead adjustments in response to environmental fluctuations. Extensive evaluation across diverse tasks, including text classification, scene recognition, and face analysis, showcases the versatility of\n CARIn\n across various model architectures, such as Convolutional Neural Networks (CNNs) and Transformers, and realistic use cases. We observe a substantial enhancement in the fair treatment of the problem’s objectives, reaching 1.92 × when compared to single-model designs, and up to 10.69 × in contrast to the state-of-the-art OODIn framework. Additionally, we achieve a significant gain of up to 4.06 × over hardware-unaware designs in multi-DNN applications. Finally, our framework sustains its performance while effectively eliminating the time overhead associated with identifying the optimal design in response to environmental challenges.\n","PeriodicalId":50914,"journal":{"name":"ACM Transactions on Embedded Computing Systems","volume":null,"pages":null},"PeriodicalIF":2.8000,"publicationDate":"2024-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Embedded Computing Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3665868","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

Abstract

The relentless expansion of deep learning (DL) applications in recent years has prompted a pivotal shift towards on-device execution, driven by the urgent need for real-time processing, heightened privacy concerns, and reduced latency across diverse domains. This paper addresses the challenges inherent in optimising the execution of deep neural networks (DNNs) on mobile devices, with a focus on device heterogeneity, multi-DNN execution, and dynamic runtime adaptation. We introduce CARIn , a novel framework designed for the optimised deployment of both single- and multi-DNN applications under user-defined service-level objectives (SLOs). Leveraging an expressive multi-objective optimisation (MOO) framework and a runtime-aware sorting and search algorithm ( RASS ) as the MOO solver, CARIn facilitates efficient adaptation to dynamic conditions while addressing resource contention issues associated with multi-DNN execution. Notably, RASS generates a set of configurations, anticipating subsequent runtime adaptation, ensuring rapid, low-overhead adjustments in response to environmental fluctuations. Extensive evaluation across diverse tasks, including text classification, scene recognition, and face analysis, showcases the versatility of CARIn across various model architectures, such as Convolutional Neural Networks (CNNs) and Transformers, and realistic use cases. We observe a substantial enhancement in the fair treatment of the problem’s objectives, reaching 1.92 × when compared to single-model designs, and up to 10.69 × in contrast to the state-of-the-art OODIn framework. Additionally, we achieve a significant gain of up to 4.06 × over hardware-unaware designs in multi-DNN applications. Finally, our framework sustains its performance while effectively eliminating the time overhead associated with identifying the optimal design in response to environmental challenges.

查看原文本刊更多论文

CARIn：在异构设备上针对单 DNN 和多 DNN 工作负载进行约束感知和响应式推理

近年来，随着深度学习（DL）应用的不断扩展，人们对实时处理的迫切需求、对隐私的高度关注以及在不同领域减少延迟，促使深度学习向设备上执行的关键转变。本文探讨了在移动设备上优化深度神经网络（DNN）执行的内在挑战，重点关注设备异构性、多 DNN 执行和动态运行时适应。我们介绍了 CARIn，这是一个新颖的框架，旨在根据用户定义的服务级目标（SLO）优化单 DNN 和多 DNN 应用程序的部署。CARIn 利用富有表现力的多目标优化（MOO）框架和运行时感知排序和搜索算法（RASS）作为 MOO 解算器，促进了对动态条件的高效适应，同时解决了与多 DNN 执行相关的资源争用问题。值得注意的是，RASS 会生成一组配置，预测后续的运行时适应，确保针对环境波动进行快速、低开销的调整。在文本分类、场景识别和人脸分析等不同任务中进行的广泛评估，展示了 CARIn 在卷积神经网络（CNN）和变换器等各种模型架构和实际用例中的多功能性。我们发现，与单一模型设计相比，CARIn 在公平处理问题目标方面有了大幅提升，达到了 1.92 倍，而与最先进的 OODIn 框架相比，则高达 10.69 倍。此外，在多 DNN 应用中，我们比硬件无感知设计显著提高了 4.06 倍。最后，我们的框架在保持其性能的同时，还有效地消除了与识别最优设计以应对环境挑战相关的时间开销。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ACM Transactions on Embedded Computing Systems 工程技术-计算机：软件工程

CiteScore

3.70

自引率

0.00%

发文量

138

审稿时长

6 months

期刊介绍： The design of embedded computing systems, both the software and hardware, increasingly relies on sophisticated algorithms, analytical models, and methodologies. ACM Transactions on Embedded Computing Systems (TECS) aims to present the leading work relating to the analysis, design, behavior, and experience with embedded computing systems.