CARIn: Constraint-Aware and Responsive Inference on Heterogeneous Devices for Single- and Multi-DNN Workloads

IF 2.8 3区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE
Ioannis Panopoulos, Stylianos I. Venieris, I. Venieris
{"title":"CARIn: Constraint-Aware and Responsive Inference on Heterogeneous Devices for Single- and Multi-DNN Workloads","authors":"Ioannis Panopoulos, Stylianos I. Venieris, I. Venieris","doi":"10.1145/3665868","DOIUrl":null,"url":null,"abstract":"\n The relentless expansion of deep learning (DL) applications in recent years has prompted a pivotal shift towards on-device execution, driven by the urgent need for real-time processing, heightened privacy concerns, and reduced latency across diverse domains. This paper addresses the challenges inherent in optimising the execution of deep neural networks (DNNs) on mobile devices, with a focus on device heterogeneity, multi-DNN execution, and dynamic runtime adaptation. We introduce\n CARIn\n , a novel framework designed for the optimised deployment of both single- and multi-DNN applications under user-defined service-level objectives (SLOs). Leveraging an expressive multi-objective optimisation (MOO) framework and a runtime-aware sorting and search algorithm (\n RASS\n ) as the MOO solver,\n CARIn\n facilitates efficient adaptation to dynamic conditions while addressing resource contention issues associated with multi-DNN execution. Notably,\n RASS\n generates a set of configurations, anticipating subsequent runtime adaptation, ensuring rapid, low-overhead adjustments in response to environmental fluctuations. Extensive evaluation across diverse tasks, including text classification, scene recognition, and face analysis, showcases the versatility of\n CARIn\n across various model architectures, such as Convolutional Neural Networks (CNNs) and Transformers, and realistic use cases. We observe a substantial enhancement in the fair treatment of the problem’s objectives, reaching 1.92 × when compared to single-model designs, and up to 10.69 × in contrast to the state-of-the-art OODIn framework. Additionally, we achieve a significant gain of up to 4.06 × over hardware-unaware designs in multi-DNN applications. Finally, our framework sustains its performance while effectively eliminating the time overhead associated with identifying the optimal design in response to environmental challenges.\n","PeriodicalId":50914,"journal":{"name":"ACM Transactions on Embedded Computing Systems","volume":null,"pages":null},"PeriodicalIF":2.8000,"publicationDate":"2024-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Embedded Computing Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3665868","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0

Abstract

The relentless expansion of deep learning (DL) applications in recent years has prompted a pivotal shift towards on-device execution, driven by the urgent need for real-time processing, heightened privacy concerns, and reduced latency across diverse domains. This paper addresses the challenges inherent in optimising the execution of deep neural networks (DNNs) on mobile devices, with a focus on device heterogeneity, multi-DNN execution, and dynamic runtime adaptation. We introduce CARIn , a novel framework designed for the optimised deployment of both single- and multi-DNN applications under user-defined service-level objectives (SLOs). Leveraging an expressive multi-objective optimisation (MOO) framework and a runtime-aware sorting and search algorithm ( RASS ) as the MOO solver, CARIn facilitates efficient adaptation to dynamic conditions while addressing resource contention issues associated with multi-DNN execution. Notably, RASS generates a set of configurations, anticipating subsequent runtime adaptation, ensuring rapid, low-overhead adjustments in response to environmental fluctuations. Extensive evaluation across diverse tasks, including text classification, scene recognition, and face analysis, showcases the versatility of CARIn across various model architectures, such as Convolutional Neural Networks (CNNs) and Transformers, and realistic use cases. We observe a substantial enhancement in the fair treatment of the problem’s objectives, reaching 1.92 × when compared to single-model designs, and up to 10.69 × in contrast to the state-of-the-art OODIn framework. Additionally, we achieve a significant gain of up to 4.06 × over hardware-unaware designs in multi-DNN applications. Finally, our framework sustains its performance while effectively eliminating the time overhead associated with identifying the optimal design in response to environmental challenges.
CARIn:在异构设备上针对单 DNN 和多 DNN 工作负载进行约束感知和响应式推理
近年来,随着深度学习(DL)应用的不断扩展,人们对实时处理的迫切需求、对隐私的高度关注以及在不同领域减少延迟,促使深度学习向设备上执行的关键转变。本文探讨了在移动设备上优化深度神经网络(DNN)执行的内在挑战,重点关注设备异构性、多 DNN 执行和动态运行时适应。我们介绍了 CARIn,这是一个新颖的框架,旨在根据用户定义的服务级目标(SLO)优化单 DNN 和多 DNN 应用程序的部署。CARIn 利用富有表现力的多目标优化(MOO)框架和运行时感知排序和搜索算法(RASS)作为 MOO 解算器,促进了对动态条件的高效适应,同时解决了与多 DNN 执行相关的资源争用问题。值得注意的是,RASS 会生成一组配置,预测后续的运行时适应,确保针对环境波动进行快速、低开销的调整。在文本分类、场景识别和人脸分析等不同任务中进行的广泛评估,展示了 CARIn 在卷积神经网络(CNN)和变换器等各种模型架构和实际用例中的多功能性。我们发现,与单一模型设计相比,CARIn 在公平处理问题目标方面有了大幅提升,达到了 1.92 倍,而与最先进的 OODIn 框架相比,则高达 10.69 倍。此外,在多 DNN 应用中,我们比硬件无感知设计显著提高了 4.06 倍。最后,我们的框架在保持其性能的同时,还有效地消除了与识别最优设计以应对环境挑战相关的时间开销。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
ACM Transactions on Embedded Computing Systems
ACM Transactions on Embedded Computing Systems 工程技术-计算机:软件工程
CiteScore
3.70
自引率
0.00%
发文量
138
审稿时长
6 months
期刊介绍: The design of embedded computing systems, both the software and hardware, increasingly relies on sophisticated algorithms, analytical models, and methodologies. ACM Transactions on Embedded Computing Systems (TECS) aims to present the leading work relating to the analysis, design, behavior, and experience with embedded computing systems.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信