神经平衡:平衡系统频率与准时懒惰的及时和节能DNN推理

IF 7.7 2区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS
Kyungmin Bin;Seyeon Kim;Sangtae Ha;Song Chong;Kyunghan Lee
{"title":"神经平衡:平衡系统频率与准时懒惰的及时和节能DNN推理","authors":"Kyungmin Bin;Seyeon Kim;Sangtae Ha;Song Chong;Kyunghan Lee","doi":"10.1109/TMC.2024.3524628","DOIUrl":null,"url":null,"abstract":"On-device deep neural network (DNN) inference is often desirable for user experience and privacy. Existing solutions have fully utilized resources to minimize inference latency. However, they result in severe energy inefficiency by completing DNN inference much earlier than the required service interval. It poses a new challenge of how to make DNN inferences in a punctual and energy-efficient manner. To tackle this challenge, we propose a new resource allocation strategy for DNN processing, namely <italic>punctual laziness</i> that disperses its workload as efficiently as possible over time within its strict delay constraint. This strategy is particularly beneficial for neural workloads since a DNN comprises a set of popular operators whose latency and energy consumption are predictable. Through this understanding, we propose NeuroBalancer, an operator-aware core and memory frequency scaling framework that balances those frequencies as efficiently as possible while making timely inferences. We implement and evaluate NeuroBalancer on off-the-shelf Android devices with various state-of-the-art DNN models. Our results show that NeuroBalancer successfully meets a given inference latency requirements while saving energy consumption up to 43.9% and 21.1% compared to the Android’s default governor and up to 42.1% and 18.6% compared to SysScale, the state-of-the-art mobile governor on CPU and GPU, respectively.","PeriodicalId":50389,"journal":{"name":"IEEE Transactions on Mobile Computing","volume":"24 5","pages":"4339-4354"},"PeriodicalIF":7.7000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"NeuroBalancer: Balancing System Frequencies With Punctual Laziness for Timely and Energy-Efficient DNN Inferences\",\"authors\":\"Kyungmin Bin;Seyeon Kim;Sangtae Ha;Song Chong;Kyunghan Lee\",\"doi\":\"10.1109/TMC.2024.3524628\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"On-device deep neural network (DNN) inference is often desirable for user experience and privacy. Existing solutions have fully utilized resources to minimize inference latency. However, they result in severe energy inefficiency by completing DNN inference much earlier than the required service interval. It poses a new challenge of how to make DNN inferences in a punctual and energy-efficient manner. To tackle this challenge, we propose a new resource allocation strategy for DNN processing, namely <italic>punctual laziness</i> that disperses its workload as efficiently as possible over time within its strict delay constraint. This strategy is particularly beneficial for neural workloads since a DNN comprises a set of popular operators whose latency and energy consumption are predictable. Through this understanding, we propose NeuroBalancer, an operator-aware core and memory frequency scaling framework that balances those frequencies as efficiently as possible while making timely inferences. We implement and evaluate NeuroBalancer on off-the-shelf Android devices with various state-of-the-art DNN models. Our results show that NeuroBalancer successfully meets a given inference latency requirements while saving energy consumption up to 43.9% and 21.1% compared to the Android’s default governor and up to 42.1% and 18.6% compared to SysScale, the state-of-the-art mobile governor on CPU and GPU, respectively.\",\"PeriodicalId\":50389,\"journal\":{\"name\":\"IEEE Transactions on Mobile Computing\",\"volume\":\"24 5\",\"pages\":\"4339-4354\"},\"PeriodicalIF\":7.7000,\"publicationDate\":\"2025-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Mobile Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10819653/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Mobile Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10819653/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

摘要

设备上深度神经网络(DNN)推理通常是用户体验和隐私的理想选择。现有的解决方案充分利用了资源来最小化推理延迟。然而,由于完成深度神经网络推理的时间远远早于所需的服务间隔,导致严重的能源效率低下。如何以准时和高效的方式进行深度神经网络推断是一个新的挑战。为了应对这一挑战,我们提出了一种新的深度神经网络处理资源分配策略,即在严格的延迟约束下,尽可能有效地分散工作负载。这种策略对神经工作负载特别有利,因为DNN包含一组流行的操作符,其延迟和能耗是可预测的。通过这种理解,我们提出了NeuroBalancer,这是一种操作员感知核心和内存频率缩放框架,可以在做出及时推断的同时尽可能有效地平衡这些频率。我们在现成的Android设备上使用各种最先进的DNN模型来实现和评估NeuroBalancer。我们的结果表明,NeuroBalancer成功地满足了给定的推理延迟要求,同时与Android的默认调控器相比,节省了高达43.9%和21.1%的能耗,与最先进的CPU和GPU移动调控器SysScale相比,分别节省了42.1%和18.6%的能耗。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
NeuroBalancer: Balancing System Frequencies With Punctual Laziness for Timely and Energy-Efficient DNN Inferences
On-device deep neural network (DNN) inference is often desirable for user experience and privacy. Existing solutions have fully utilized resources to minimize inference latency. However, they result in severe energy inefficiency by completing DNN inference much earlier than the required service interval. It poses a new challenge of how to make DNN inferences in a punctual and energy-efficient manner. To tackle this challenge, we propose a new resource allocation strategy for DNN processing, namely punctual laziness that disperses its workload as efficiently as possible over time within its strict delay constraint. This strategy is particularly beneficial for neural workloads since a DNN comprises a set of popular operators whose latency and energy consumption are predictable. Through this understanding, we propose NeuroBalancer, an operator-aware core and memory frequency scaling framework that balances those frequencies as efficiently as possible while making timely inferences. We implement and evaluate NeuroBalancer on off-the-shelf Android devices with various state-of-the-art DNN models. Our results show that NeuroBalancer successfully meets a given inference latency requirements while saving energy consumption up to 43.9% and 21.1% compared to the Android’s default governor and up to 42.1% and 18.6% compared to SysScale, the state-of-the-art mobile governor on CPU and GPU, respectively.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
IEEE Transactions on Mobile Computing
IEEE Transactions on Mobile Computing 工程技术-电信学
CiteScore
12.90
自引率
2.50%
发文量
403
审稿时长
6.6 months
期刊介绍: IEEE Transactions on Mobile Computing addresses key technical issues related to various aspects of mobile computing. This includes (a) architectures, (b) support services, (c) algorithm/protocol design and analysis, (d) mobile environments, (e) mobile communication systems, (f) applications, and (g) emerging technologies. Topics of interest span a wide range, covering aspects like mobile networks and hosts, mobility management, multimedia, operating system support, power management, online and mobile environments, security, scalability, reliability, and emerging technologies such as wearable computers, body area networks, and wireless sensor networks. The journal serves as a comprehensive platform for advancements in mobile computing research.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信