AMRE: Adaptive Multilevel Redundancy Elimination for Multimodal Mobile Inference

IF 7.7 2区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS
Qixuan Cai;Ruikai Chu;Kaixuan Zhang;Xiulong Liu;Xinyu Tong;Xin Xie;Jiancheng Chen;Keqiu Li
{"title":"AMRE: Adaptive Multilevel Redundancy Elimination for Multimodal Mobile Inference","authors":"Qixuan Cai;Ruikai Chu;Kaixuan Zhang;Xiulong Liu;Xinyu Tong;Xin Xie;Jiancheng Chen;Keqiu Li","doi":"10.1109/TMC.2025.3549422","DOIUrl":null,"url":null,"abstract":"Given privacy and network load concerns, employing on-device multimodal neural networks (MNNs) for IoT data is a growing trend. However, the high computational demands of MNNs clash with limited on-device resources. MNNs involve input and model redundancies during inference, wasting resources to process redundant input components and run excess model parameters. Model Redundancy Elimination (MRE) reduces redundant parameters but cannot bypass inference for unnecessary input components. Input Redundancy Elimination (IRE) skips inference for redundant input components but cannot reduce computation for the remaining parts. MRE and IRE independently fail to meet the diverse computational needs of multimodal inference. To address these issues, we aim to combine the advantages of MRE and IRE to achieve a more efficient inference. We propose an <underline><b>a</b></u>daptive <underline><b>m</b></u>ultilevel <underline><b>r</b></u>edundancy <underline><b>e</b></u>limination framework (<italic>AMRE</i>), which supports both IRE and MRE. <italic>AMRE</i> first establishes a collaborative inference mechanism for IRE and MRE. We then propose a multifunctional, lightweight policy model that adaptively controls the inference logic for each instance. Moreover, a three-stage training method is proposed to ensure the performance of collaborative inference in <italic>AMRE</i>. We validate <italic>AMRE</i> in three scenarios, achieving up to 52.91% lower latency, 56.79% lower energy cost, and a slight accuracy gain compared to state-of-the-art baselines.","PeriodicalId":50389,"journal":{"name":"IEEE Transactions on Mobile Computing","volume":"24 8","pages":"7568-7583"},"PeriodicalIF":7.7000,"publicationDate":"2025-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Mobile Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10918837/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Given privacy and network load concerns, employing on-device multimodal neural networks (MNNs) for IoT data is a growing trend. However, the high computational demands of MNNs clash with limited on-device resources. MNNs involve input and model redundancies during inference, wasting resources to process redundant input components and run excess model parameters. Model Redundancy Elimination (MRE) reduces redundant parameters but cannot bypass inference for unnecessary input components. Input Redundancy Elimination (IRE) skips inference for redundant input components but cannot reduce computation for the remaining parts. MRE and IRE independently fail to meet the diverse computational needs of multimodal inference. To address these issues, we aim to combine the advantages of MRE and IRE to achieve a more efficient inference. We propose an adaptive multilevel redundancy elimination framework (AMRE), which supports both IRE and MRE. AMRE first establishes a collaborative inference mechanism for IRE and MRE. We then propose a multifunctional, lightweight policy model that adaptively controls the inference logic for each instance. Moreover, a three-stage training method is proposed to ensure the performance of collaborative inference in AMRE. We validate AMRE in three scenarios, achieving up to 52.91% lower latency, 56.79% lower energy cost, and a slight accuracy gain compared to state-of-the-art baselines.
多模态移动推理的自适应多级冗余消除
考虑到隐私和网络负载问题,使用设备上的多模态神经网络(mnn)处理物联网数据是一个日益增长的趋势。然而,MNNs的高计算需求与有限的设备上资源相冲突。mnn在推理过程中涉及输入和模型冗余,浪费资源处理冗余输入组件和运行多余的模型参数。模型冗余消除(MRE)可以减少冗余参数,但不能绕过对不需要的输入组件的推理。输入冗余消除(IRE)跳过冗余输入组件的推理,但不能减少对其余部分的计算。独立的MRE和IRE不能满足多模态推理的多样化计算需求。为了解决这些问题,我们的目标是结合MRE和IRE的优势来实现更有效的推理。提出了一种支持IRE和MRE的自适应多级冗余消除框架(AMRE)。AMRE首先建立了IRE和MRE的协同推理机制。然后,我们提出了一个多功能、轻量级的策略模型,该模型可以自适应地控制每个实例的推理逻辑。此外,为了保证AMRE协同推理的性能,提出了一种三阶段训练方法。我们在三种场景中验证了AMRE,与最先进的基线相比,延迟降低了52.91%,能源成本降低了56.79%,并且精度略有提高。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
IEEE Transactions on Mobile Computing
IEEE Transactions on Mobile Computing 工程技术-电信学
CiteScore
12.90
自引率
2.50%
发文量
403
审稿时长
6.6 months
期刊介绍: IEEE Transactions on Mobile Computing addresses key technical issues related to various aspects of mobile computing. This includes (a) architectures, (b) support services, (c) algorithm/protocol design and analysis, (d) mobile environments, (e) mobile communication systems, (f) applications, and (g) emerging technologies. Topics of interest span a wide range, covering aspects like mobile networks and hosts, mobility management, multimedia, operating system support, power management, online and mobile environments, security, scalability, reliability, and emerging technologies such as wearable computers, body area networks, and wireless sensor networks. The journal serves as a comprehensive platform for advancements in mobile computing research.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信