AMRE: Adaptive Multilevel Redundancy Elimination for Multimodal Mobile Inference

IF 7.7 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Mobile Computing Pub Date : 2025-03-10 DOI:10.1109/TMC.2025.3549422

Qixuan Cai;Ruikai Chu;Kaixuan Zhang;Xiulong Liu;Xinyu Tong;Xin Xie;Jiancheng Chen;Keqiu Li

{"title":"AMRE: Adaptive Multilevel Redundancy Elimination for Multimodal Mobile Inference","authors":"Qixuan Cai;Ruikai Chu;Kaixuan Zhang;Xiulong Liu;Xinyu Tong;Xin Xie;Jiancheng Chen;Keqiu Li","doi":"10.1109/TMC.2025.3549422","DOIUrl":null,"url":null,"abstract":"Given privacy and network load concerns, employing on-device multimodal neural networks (MNNs) for IoT data is a growing trend. However, the high computational demands of MNNs clash with limited on-device resources. MNNs involve input and model redundancies during inference, wasting resources to process redundant input components and run excess model parameters. Model Redundancy Elimination (MRE) reduces redundant parameters but cannot bypass inference for unnecessary input components. Input Redundancy Elimination (IRE) skips inference for redundant input components but cannot reduce computation for the remaining parts. MRE and IRE independently fail to meet the diverse computational needs of multimodal inference. To address these issues, we aim to combine the advantages of MRE and IRE to achieve a more efficient inference. We propose an <underline>adaptive <underline>multilevel <underline>redundancy <underline>elimination framework (<italic>AMRE), which supports both IRE and MRE. <italic>AMRE first establishes a collaborative inference mechanism for IRE and MRE. We then propose a multifunctional, lightweight policy model that adaptively controls the inference logic for each instance. Moreover, a three-stage training method is proposed to ensure the performance of collaborative inference in <italic>AMRE. We validate <italic>AMRE in three scenarios, achieving up to 52.91% lower latency, 56.79% lower energy cost, and a slight accuracy gain compared to state-of-the-art baselines.","PeriodicalId":50389,"journal":{"name":"IEEE Transactions on Mobile Computing","volume":"24 8","pages":"7568-7583"},"PeriodicalIF":7.7000,"publicationDate":"2025-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Mobile Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10918837/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Given privacy and network load concerns, employing on-device multimodal neural networks (MNNs) for IoT data is a growing trend. However, the high computational demands of MNNs clash with limited on-device resources. MNNs involve input and model redundancies during inference, wasting resources to process redundant input components and run excess model parameters. Model Redundancy Elimination (MRE) reduces redundant parameters but cannot bypass inference for unnecessary input components. Input Redundancy Elimination (IRE) skips inference for redundant input components but cannot reduce computation for the remaining parts. MRE and IRE independently fail to meet the diverse computational needs of multimodal inference. To address these issues, we aim to combine the advantages of MRE and IRE to achieve a more efficient inference. We propose an adaptive multilevel redundancy elimination framework (AMRE), which supports both IRE and MRE. AMRE first establishes a collaborative inference mechanism for IRE and MRE. We then propose a multifunctional, lightweight policy model that adaptively controls the inference logic for each instance. Moreover, a three-stage training method is proposed to ensure the performance of collaborative inference in AMRE. We validate AMRE in three scenarios, achieving up to 52.91% lower latency, 56.79% lower energy cost, and a slight accuracy gain compared to state-of-the-art baselines.

查看原文本刊更多论文

多模态移动推理的自适应多级冗余消除

考虑到隐私和网络负载问题，使用设备上的多模态神经网络（mnn）处理物联网数据是一个日益增长的趋势。然而，MNNs的高计算需求与有限的设备上资源相冲突。mnn在推理过程中涉及输入和模型冗余，浪费资源处理冗余输入组件和运行多余的模型参数。模型冗余消除（MRE）可以减少冗余参数，但不能绕过对不需要的输入组件的推理。输入冗余消除（IRE）跳过冗余输入组件的推理，但不能减少对其余部分的计算。独立的MRE和IRE不能满足多模态推理的多样化计算需求。为了解决这些问题，我们的目标是结合MRE和IRE的优势来实现更有效的推理。提出了一种支持IRE和MRE的自适应多级冗余消除框架（AMRE）。AMRE首先建立了IRE和MRE的协同推理机制。然后，我们提出了一个多功能、轻量级的策略模型，该模型可以自适应地控制每个实例的推理逻辑。此外，为了保证AMRE协同推理的性能，提出了一种三阶段训练方法。我们在三种场景中验证了AMRE，与最先进的基线相比，延迟降低了52.91%，能源成本降低了56.79%，并且精度略有提高。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Mobile Computing 工程技术-电信学

CiteScore

12.90

自引率

2.50%

发文量

403

审稿时长

6.6 months

期刊介绍： IEEE Transactions on Mobile Computing addresses key technical issues related to various aspects of mobile computing. This includes (a) architectures, (b) support services, (c) algorithm/protocol design and analysis, (d) mobile environments, (e) mobile communication systems, (f) applications, and (g) emerging technologies. Topics of interest span a wide range, covering aspects like mobile networks and hosts, mobility management, multimedia, operating system support, power management, online and mobile environments, security, scalability, reliability, and emerging technologies such as wearable computers, body area networks, and wireless sensor networks. The journal serves as a comprehensive platform for advancements in mobile computing research.