{"title":"Hybrid Redundancy for Reliable Task Offloading in Collaborative Edge Computing","authors":"Hao Guo;Lei Yang;Qingfeng Zhang;Jiannong Cao","doi":"10.1109/TC.2025.3587620","DOIUrl":null,"url":null,"abstract":"Collaborative edge computing enables task execution on the computing resources of geo-distributed edge nodes. One of the key challenges in this field is to realize reliable task offloading by deciding whether to execute tasks locally or delegate them to neighboring nodes while ensuring task reliability. Achieving reliable task offloading is essential for preventing task failures and maintaining optimal system performance. Existing works commonly rely on task redundancy strategies, such as active or passive redundancy. However, these approaches lack adaptive redundancy mechanisms to respond to changes in the network environment, potentially resulting in resource wastage from excessive redundancy or task failures due to insufficient redundancy. In this work, we introduce a novel approach called Hybrid Redundancy for Task Offloading (HRTO) to optimize task latency and reliability. Specifically, HRTO utilizes deep reinforcement learning (DRL) to learn a task offloading policy that maximizes task success rates. With this policy, edge nodes dynamically adjust task redundancy levels based on real-time network load conditions and meanwhile assess whether the task instance is necessary for re-execution in case of task failure. Extensive experiments on real-world network topologies and a Kubernetes-based testbed evaluate the effectiveness of HRTO, showing a 14.6% increase in success rate over the benchmarks.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 9","pages":"3238-3250"},"PeriodicalIF":3.8000,"publicationDate":"2025-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computers","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11077746/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
Collaborative edge computing enables task execution on the computing resources of geo-distributed edge nodes. One of the key challenges in this field is to realize reliable task offloading by deciding whether to execute tasks locally or delegate them to neighboring nodes while ensuring task reliability. Achieving reliable task offloading is essential for preventing task failures and maintaining optimal system performance. Existing works commonly rely on task redundancy strategies, such as active or passive redundancy. However, these approaches lack adaptive redundancy mechanisms to respond to changes in the network environment, potentially resulting in resource wastage from excessive redundancy or task failures due to insufficient redundancy. In this work, we introduce a novel approach called Hybrid Redundancy for Task Offloading (HRTO) to optimize task latency and reliability. Specifically, HRTO utilizes deep reinforcement learning (DRL) to learn a task offloading policy that maximizes task success rates. With this policy, edge nodes dynamically adjust task redundancy levels based on real-time network load conditions and meanwhile assess whether the task instance is necessary for re-execution in case of task failure. Extensive experiments on real-world network topologies and a Kubernetes-based testbed evaluate the effectiveness of HRTO, showing a 14.6% increase in success rate over the benchmarks.
期刊介绍:
The IEEE Transactions on Computers is a monthly publication with a wide distribution to researchers, developers, technical managers, and educators in the computer field. It publishes papers on research in areas of current interest to the readers. These areas include, but are not limited to, the following: a) computer organizations and architectures; b) operating systems, software systems, and communication protocols; c) real-time systems and embedded systems; d) digital devices, computer components, and interconnection networks; e) specification, design, prototyping, and testing methods and tools; f) performance, fault tolerance, reliability, security, and testability; g) case studies and experimental and theoretical evaluations; and h) new and important applications and trends.