Distributed Online Service Coordination Using Deep Reinforcement Learning

Stefan Schneider, Haydar Qarawlus, Holger Karl
{"title":"Distributed Online Service Coordination Using Deep Reinforcement Learning","authors":"Stefan Schneider, Haydar Qarawlus, Holger Karl","doi":"10.1109/ICDCS51616.2021.00058","DOIUrl":null,"url":null,"abstract":"Services often consist of multiple chained components such as microservices in a service mesh, or machine learning functions in a pipeline. Providing these services requires online coordination including scaling the service, placing instance of all components in the network, scheduling traffic to these instances, and routing traffic through the network. Optimized service coordination is still a hard problem due to many influencing factors such as rapidly arriving user demands and limited node and link capacity. Existing approaches to solve the problem are often built on rigid models and assumptions, tailored to specific scenarios. If the scenario changes and the assumptions no longer hold, they easily break and require manual adjustments by experts. Novel self-learning approaches using deep reinforcement learning (DRL) are promising but still have limitations as they only address simplified versions of the problem and are typically centralized and thus do not scale to practical large-scale networks. To address these issues, we propose a distributed self-learning service coordination approach using DRL. After centralized training, we deploy a distributed DRL agent at each node in the network, making fast coordination decisions locally in parallel with the other nodes. Each agent only observes its direct neighbors and does not need global knowledge. Hence, our approach scales independently from the size of the network. In our extensive evaluation using real-world network topologies and traffic traces, we show that our proposed approach outperforms a state-of-the-art conventional heuristic as well as a centralized DRL approach (60 % higher throughput on average) while requiring less time per online decision (1 ms).","PeriodicalId":222376,"journal":{"name":"2021 IEEE 41st International Conference on Distributed Computing Systems (ICDCS)","volume":"137 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 41st International Conference on Distributed Computing Systems (ICDCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDCS51616.2021.00058","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8

Abstract

Services often consist of multiple chained components such as microservices in a service mesh, or machine learning functions in a pipeline. Providing these services requires online coordination including scaling the service, placing instance of all components in the network, scheduling traffic to these instances, and routing traffic through the network. Optimized service coordination is still a hard problem due to many influencing factors such as rapidly arriving user demands and limited node and link capacity. Existing approaches to solve the problem are often built on rigid models and assumptions, tailored to specific scenarios. If the scenario changes and the assumptions no longer hold, they easily break and require manual adjustments by experts. Novel self-learning approaches using deep reinforcement learning (DRL) are promising but still have limitations as they only address simplified versions of the problem and are typically centralized and thus do not scale to practical large-scale networks. To address these issues, we propose a distributed self-learning service coordination approach using DRL. After centralized training, we deploy a distributed DRL agent at each node in the network, making fast coordination decisions locally in parallel with the other nodes. Each agent only observes its direct neighbors and does not need global knowledge. Hence, our approach scales independently from the size of the network. In our extensive evaluation using real-world network topologies and traffic traces, we show that our proposed approach outperforms a state-of-the-art conventional heuristic as well as a centralized DRL approach (60 % higher throughput on average) while requiring less time per online decision (1 ms).
使用深度强化学习的分布式在线服务协调
服务通常由多个链式组件组成,比如服务网格中的微服务,或者管道中的机器学习功能。提供这些服务需要在线协调,包括扩展服务、在网络中放置所有组件的实例、为这些实例调度流量以及通过网络路由流量。由于用户需求到达速度快、节点链路容量有限等因素的影响,优化服务协调仍然是一个难题。现有的解决问题的方法通常建立在严格的模型和假设之上,并针对特定的场景进行了调整。如果情景发生变化,假设不再成立,它们很容易失效,需要专家进行手动调整。使用深度强化学习(DRL)的新颖自学习方法很有前途,但仍然有局限性,因为它们只解决问题的简化版本,并且通常是集中的,因此不能扩展到实际的大规模网络。为了解决这些问题,我们提出了一种使用DRL的分布式自学习服务协调方法。在集中训练后,我们在网络的每个节点上部署分布式DRL代理,与其他节点并行地在本地进行快速协调决策。每个智能体只观察它的直接邻居,不需要全局知识。因此,我们的方法与网络的大小无关。在我们使用真实网络拓扑和流量跟踪进行的广泛评估中,我们表明,我们提出的方法优于最先进的传统启发式方法和集中式DRL方法(平均吞吐量提高60%),同时每个在线决策所需的时间更少(1毫秒)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信