Qixuan Cai;Ruikai Chu;Kaixuan Zhang;Xiulong Liu;Xinyu Tong;Xin Xie;Jiancheng Chen;Keqiu Li
{"title":"AMRE: Adaptive Multilevel Redundancy Elimination for Multimodal Mobile Inference","authors":"Qixuan Cai;Ruikai Chu;Kaixuan Zhang;Xiulong Liu;Xinyu Tong;Xin Xie;Jiancheng Chen;Keqiu Li","doi":"10.1109/TMC.2025.3549422","DOIUrl":null,"url":null,"abstract":"Given privacy and network load concerns, employing on-device multimodal neural networks (MNNs) for IoT data is a growing trend. However, the high computational demands of MNNs clash with limited on-device resources. MNNs involve input and model redundancies during inference, wasting resources to process redundant input components and run excess model parameters. Model Redundancy Elimination (MRE) reduces redundant parameters but cannot bypass inference for unnecessary input components. Input Redundancy Elimination (IRE) skips inference for redundant input components but cannot reduce computation for the remaining parts. MRE and IRE independently fail to meet the diverse computational needs of multimodal inference. To address these issues, we aim to combine the advantages of MRE and IRE to achieve a more efficient inference. We propose an <underline><b>a</b></u>daptive <underline><b>m</b></u>ultilevel <underline><b>r</b></u>edundancy <underline><b>e</b></u>limination framework (<italic>AMRE</i>), which supports both IRE and MRE. <italic>AMRE</i> first establishes a collaborative inference mechanism for IRE and MRE. We then propose a multifunctional, lightweight policy model that adaptively controls the inference logic for each instance. Moreover, a three-stage training method is proposed to ensure the performance of collaborative inference in <italic>AMRE</i>. We validate <italic>AMRE</i> in three scenarios, achieving up to 52.91% lower latency, 56.79% lower energy cost, and a slight accuracy gain compared to state-of-the-art baselines.","PeriodicalId":50389,"journal":{"name":"IEEE Transactions on Mobile Computing","volume":"24 8","pages":"7568-7583"},"PeriodicalIF":7.7000,"publicationDate":"2025-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Mobile Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10918837/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Given privacy and network load concerns, employing on-device multimodal neural networks (MNNs) for IoT data is a growing trend. However, the high computational demands of MNNs clash with limited on-device resources. MNNs involve input and model redundancies during inference, wasting resources to process redundant input components and run excess model parameters. Model Redundancy Elimination (MRE) reduces redundant parameters but cannot bypass inference for unnecessary input components. Input Redundancy Elimination (IRE) skips inference for redundant input components but cannot reduce computation for the remaining parts. MRE and IRE independently fail to meet the diverse computational needs of multimodal inference. To address these issues, we aim to combine the advantages of MRE and IRE to achieve a more efficient inference. We propose an adaptive multilevel redundancy elimination framework (AMRE), which supports both IRE and MRE. AMRE first establishes a collaborative inference mechanism for IRE and MRE. We then propose a multifunctional, lightweight policy model that adaptively controls the inference logic for each instance. Moreover, a three-stage training method is proposed to ensure the performance of collaborative inference in AMRE. We validate AMRE in three scenarios, achieving up to 52.91% lower latency, 56.79% lower energy cost, and a slight accuracy gain compared to state-of-the-art baselines.
期刊介绍:
IEEE Transactions on Mobile Computing addresses key technical issues related to various aspects of mobile computing. This includes (a) architectures, (b) support services, (c) algorithm/protocol design and analysis, (d) mobile environments, (e) mobile communication systems, (f) applications, and (g) emerging technologies. Topics of interest span a wide range, covering aspects like mobile networks and hosts, mobility management, multimedia, operating system support, power management, online and mobile environments, security, scalability, reliability, and emerging technologies such as wearable computers, body area networks, and wireless sensor networks. The journal serves as a comprehensive platform for advancements in mobile computing research.