Learning the Optimal Controller Placement in Mobile Software-Defined Networks

2022 IEEE 23rd International Symposium on a World of Wireless, Mobile and Multimedia Networks (WoWMoM) Pub Date : 2022-06-01 DOI:10.1109/WoWMoM54355.2022.00029

I. Koutsopoulos

{"title":"Learning the Optimal Controller Placement in Mobile Software-Defined Networks","authors":"I. Koutsopoulos","doi":"10.1109/WoWMoM54355.2022.00029","DOIUrl":null,"url":null,"abstract":"We formulate and study the problem of online learning of the optimal controller selection policy in mobile Software-Defined Networks, where the controller-switch round-trip-time (RTT) delays are unknown and time-varying. Static optimization approaches are not helpful, since delays vary significantly (and sometimes, arbitrarily) from one slot to another, and only RTT delays from the current active controller can be easily measured. First, we model the sequence of RTT delays across time as a stationary random process so that the value at each time slot is a sample from an unknown probability distribution with unknown mean. This approach is applicable in relatively static network settings, where stationarity can be assumed. We cast the problem as a stochastic multiarmed bandit, where the arms are the different controller choices, and we fit different bandit algorithms to that setting, such as: the Lowest Confidence Bound (LCB) algorithm by modifying the known Upper Confidence Bound (UCB) one, the LCB-tuned one, and the Boltzmann exploration one. The first two are known to achieve sublinear regret, while the last one turns out to be very efficient. In a second approach, the random process of RTTs is non-stationary and thus cannot be characterized statistically. This scenario is applicable in cases of arbitrary mobility and other dynamics that affect RTT delays in an unpredictable, adversarial manner. We pose the problem as an adversarial bandit that can be solved with the EXP3 algorithm which achieves sublinear regret. We argue that all approaches can be implemented in an SDN environment with lightweight messaging. We also compare the performance of these algorithms for different problem settings and hyper-parameters that reflect the efficiency of the learning process. Numerical evaluation shows that Boltzmann exploration achieves the best performance.","PeriodicalId":275324,"journal":{"name":"2022 IEEE 23rd International Symposium on a World of Wireless, Mobile and Multimedia Networks (WoWMoM)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 23rd International Symposium on a World of Wireless, Mobile and Multimedia Networks (WoWMoM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WoWMoM54355.2022.00029","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

We formulate and study the problem of online learning of the optimal controller selection policy in mobile Software-Defined Networks, where the controller-switch round-trip-time (RTT) delays are unknown and time-varying. Static optimization approaches are not helpful, since delays vary significantly (and sometimes, arbitrarily) from one slot to another, and only RTT delays from the current active controller can be easily measured. First, we model the sequence of RTT delays across time as a stationary random process so that the value at each time slot is a sample from an unknown probability distribution with unknown mean. This approach is applicable in relatively static network settings, where stationarity can be assumed. We cast the problem as a stochastic multiarmed bandit, where the arms are the different controller choices, and we fit different bandit algorithms to that setting, such as: the Lowest Confidence Bound (LCB) algorithm by modifying the known Upper Confidence Bound (UCB) one, the LCB-tuned one, and the Boltzmann exploration one. The first two are known to achieve sublinear regret, while the last one turns out to be very efficient. In a second approach, the random process of RTTs is non-stationary and thus cannot be characterized statistically. This scenario is applicable in cases of arbitrary mobility and other dynamics that affect RTT delays in an unpredictable, adversarial manner. We pose the problem as an adversarial bandit that can be solved with the EXP3 algorithm which achieves sublinear regret. We argue that all approaches can be implemented in an SDN environment with lightweight messaging. We also compare the performance of these algorithms for different problem settings and hyper-parameters that reflect the efficiency of the learning process. Numerical evaluation shows that Boltzmann exploration achieves the best performance.

查看原文本刊更多论文

学习移动软件定义网络中控制器的最优配置

我们制定并研究了移动软件定义网络中最优控制器选择策略的在线学习问题，其中控制器-开关往返时间(RTT)延迟是未知的且时变的。静态优化方法没有帮助，因为从一个插槽到另一个插槽的延迟变化很大(有时是任意的)，并且只有来自当前活动控制器的RTT延迟可以很容易地测量。首先，我们将RTT延迟序列建模为一个平稳随机过程，使得每个时隙的值是来自未知均值的未知概率分布的样本。这种方法适用于相对静态的网络设置，其中可以假定平稳性。我们将问题视为随机多臂强盗，其中手臂是不同的控制器选择，并且我们将不同的强盗算法拟合到该设置中，例如:通过修改已知的上置信度界(UCB)算法，LCB调谐算法和玻尔兹曼探索算法的最低置信度界(LCB)算法。众所周知，前两种方法可以实现次线性后悔，而最后一种方法非常有效。在第二种方法中，rtt的随机过程是非平稳的，因此不能进行统计表征。这种情况适用于任意机动性和其他以不可预测的、对抗的方式影响RTT延迟的动态情况。我们把这个问题看作是一个对抗性的强盗，可以用EXP3算法来解决，该算法实现了次线性后悔。我们认为所有方法都可以在轻量级消息传递的SDN环境中实现。我们还比较了这些算法在不同问题设置和反映学习过程效率的超参数下的性能。数值计算结果表明，玻尔兹曼探测方法具有最佳的探测性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 IEEE 23rd International Symposium on a World of Wireless, Mobile and Multimedia Networks (WoWMoM)

自引率

0.00%

发文量