{"title":"ACRE: Actor Critic Reinforcement Learning for Failure-Aware Edge Computing Migrations","authors":"Marie Siew, Shikhar Sharma, Carlee Joe-Wong","doi":"10.1109/CISS56502.2023.10089694","DOIUrl":null,"url":null,"abstract":"In edge computing, users' service profiles are migrated in response to user mobility, to minimize the user-experienced delay, balanced against the migration cost. Due to imperfect information on transition probabilities and costs, reinforcement learning (RL) is often used to optimize service migration. Nevertheless, current works do not optimize service migration in light of occasional server failures. While server failures are rare, they impact the smooth and safe functioning of latency sensitive edge computing applications like autonomous driving and real-time obstacle detection, because users can no longer complete their computing jobs. As these failures occur at a low probability, it is difficult for RL algorithms, which are data and experience driven, to learn an optimal service migration policy for both the usual and rare event scenarios. Therefore, we propose an algorithm ImACRE, which integrates importance sampling into actor critic reinforcement learning, to learn the optimal service profile and backup placement policy. Our algorithm uses importance sampling to sample rare events in a simulator, at a rate proportional to their contribution to system costs, while balancing service migration trade-offs between delay and migration costs, with failure costs, backup placement and migration costs. We use trace driven experiments to show that our algorithm gives cost reductions in the event of failures.","PeriodicalId":243775,"journal":{"name":"2023 57th Annual Conference on Information Sciences and Systems (CISS)","volume":"141 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 57th Annual Conference on Information Sciences and Systems (CISS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CISS56502.2023.10089694","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In edge computing, users' service profiles are migrated in response to user mobility, to minimize the user-experienced delay, balanced against the migration cost. Due to imperfect information on transition probabilities and costs, reinforcement learning (RL) is often used to optimize service migration. Nevertheless, current works do not optimize service migration in light of occasional server failures. While server failures are rare, they impact the smooth and safe functioning of latency sensitive edge computing applications like autonomous driving and real-time obstacle detection, because users can no longer complete their computing jobs. As these failures occur at a low probability, it is difficult for RL algorithms, which are data and experience driven, to learn an optimal service migration policy for both the usual and rare event scenarios. Therefore, we propose an algorithm ImACRE, which integrates importance sampling into actor critic reinforcement learning, to learn the optimal service profile and backup placement policy. Our algorithm uses importance sampling to sample rare events in a simulator, at a rate proportional to their contribution to system costs, while balancing service migration trade-offs between delay and migration costs, with failure costs, backup placement and migration costs. We use trace driven experiments to show that our algorithm gives cost reductions in the event of failures.