{"title":"Online Model Retraining, Compression, and Instance Allocation in Edge Computing Networks","authors":"Shijia Huang;Fan Yang;Qian Ma;Shimin Gong","doi":"10.1109/TNSE.2026.3665761","DOIUrl":null,"url":null,"abstract":"The adoption of artificial intelligence models (e.g., DNN models) in Internet of Things has boosted computing demands in edge computing. Frequent model retraining, necessitated by concept drift, further increases resource usage, while model compression sacrifices model performance for computing efficiency. However, few works study computing instance allocation problem considering dynamic model retraining and compression, especially under varying workloads and model performance degradation. In this work, we model the problem as a joint online model retraining, compression, and instance allocation problem in edge computing networks considering model performance and instance cost. Solving the online problem is challenging since it is a non-linear binary programming problem with time-coupling instance switching cost. We first solve the online problem under fixed compression and propose an efficient online algorithm. Specifically, we first linearize the non-linear term, then regularize the time-coupling switching cost to decouple the problem, and finally use a randomization rounding method to derive the integral solution. We prove that our algorithm achieves a constant optimality gap. We then solve the online problem under flexible compression and propose a lightweight online algorithm. We extend the linearization method and decouple the problem into each time slot, and demonstrate our algorithm achieves an optimality gap depending on the time period. Simulations demonstrate that our algorithm can achieve a balance between instance cost and model performance in both fixed compression and flexible compression scenarios.","PeriodicalId":54229,"journal":{"name":"IEEE Transactions on Network Science and Engineering","volume":"13 ","pages":"7156-7172"},"PeriodicalIF":7.9000,"publicationDate":"2026-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Network Science and Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11397553/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
The adoption of artificial intelligence models (e.g., DNN models) in Internet of Things has boosted computing demands in edge computing. Frequent model retraining, necessitated by concept drift, further increases resource usage, while model compression sacrifices model performance for computing efficiency. However, few works study computing instance allocation problem considering dynamic model retraining and compression, especially under varying workloads and model performance degradation. In this work, we model the problem as a joint online model retraining, compression, and instance allocation problem in edge computing networks considering model performance and instance cost. Solving the online problem is challenging since it is a non-linear binary programming problem with time-coupling instance switching cost. We first solve the online problem under fixed compression and propose an efficient online algorithm. Specifically, we first linearize the non-linear term, then regularize the time-coupling switching cost to decouple the problem, and finally use a randomization rounding method to derive the integral solution. We prove that our algorithm achieves a constant optimality gap. We then solve the online problem under flexible compression and propose a lightweight online algorithm. We extend the linearization method and decouple the problem into each time slot, and demonstrate our algorithm achieves an optimality gap depending on the time period. Simulations demonstrate that our algorithm can achieve a balance between instance cost and model performance in both fixed compression and flexible compression scenarios.
期刊介绍:
The proposed journal, called the IEEE Transactions on Network Science and Engineering (TNSE), is committed to timely publishing of peer-reviewed technical articles that deal with the theory and applications of network science and the interconnections among the elements in a system that form a network. In particular, the IEEE Transactions on Network Science and Engineering publishes articles on understanding, prediction, and control of structures and behaviors of networks at the fundamental level. The types of networks covered include physical or engineered networks, information networks, biological networks, semantic networks, economic networks, social networks, and ecological networks. Aimed at discovering common principles that govern network structures, network functionalities and behaviors of networks, the journal seeks articles on understanding, prediction, and control of structures and behaviors of networks. Another trans-disciplinary focus of the IEEE Transactions on Network Science and Engineering is the interactions between and co-evolution of different genres of networks.