非平稳线性动力系统控制的动态遗憾最小化

Abstract Proceedings of the 2022 ACM SIGMETRICS/IFIP PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems Pub Date : 2022-06-06 DOI:10.1145/3489048.3522649

Yuwei Luo, Varun Gupta, M. Kolar

{"title":"非平稳线性动力系统控制的动态遗憾最小化","authors":"Yuwei Luo, Varun Gupta, M. Kolar","doi":"10.1145/3489048.3522649","DOIUrl":null,"url":null,"abstract":"We consider the problem of controlling a Linear Quadratic Regulator (LQR) system over a finite horizon T with fixed and known cost matrices Q,R, but unknown and non-stationary dynamics At, Bt. The sequence of dynamics matrices can be arbitrary, but with a total variation, VT, assumed to be o(T) and unknown to the controller. Under the assumption that a sequence of stabilizing, but potentially sub-optimal controllers is available for all t, we present an algorithm that achieves the optimal dynamic regret of Õ(VT2/5 T3/5). With piecewise constant dynamics, our algorithm achieves the optimal regret of Õ(√ST) where S is the number of switches. The crux of our algorithm is an adaptive non-stationarity detection strategy, which builds on an approach recently developed for contextual Multi-armed Bandit problems. We also argue that non-adaptive forgetting (e.g., restarting or using sliding window learning with a static window size) may not be regret optimal for the LQR problem, even when the window size is optimally tuned with the knowledge of VT. The main technical challenge in the analysis of our algorithm is to prove that the ordinary least squares (OLS) estimator has a small bias when the parameter to be estimated is non-stationary. Our analysis also highlights that the key motif driving the regret is that the LQR problem is in spirit a bandit problem with linear feedback and locally quadratic cost. This motif is more universal than the LQR problem itself, and therefore we believe our results should find wider application.","PeriodicalId":264598,"journal":{"name":"Abstract Proceedings of the 2022 ACM SIGMETRICS/IFIP PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems","volume":"54 4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Dynamic Regret Minimization for Control of Non-stationary Linear Dynamical Systems\",\"authors\":\"Yuwei Luo, Varun Gupta, M. Kolar\",\"doi\":\"10.1145/3489048.3522649\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We consider the problem of controlling a Linear Quadratic Regulator (LQR) system over a finite horizon T with fixed and known cost matrices Q,R, but unknown and non-stationary dynamics At, Bt. The sequence of dynamics matrices can be arbitrary, but with a total variation, VT, assumed to be o(T) and unknown to the controller. Under the assumption that a sequence of stabilizing, but potentially sub-optimal controllers is available for all t, we present an algorithm that achieves the optimal dynamic regret of Õ(VT2/5 T3/5). With piecewise constant dynamics, our algorithm achieves the optimal regret of Õ(√ST) where S is the number of switches. The crux of our algorithm is an adaptive non-stationarity detection strategy, which builds on an approach recently developed for contextual Multi-armed Bandit problems. We also argue that non-adaptive forgetting (e.g., restarting or using sliding window learning with a static window size) may not be regret optimal for the LQR problem, even when the window size is optimally tuned with the knowledge of VT. The main technical challenge in the analysis of our algorithm is to prove that the ordinary least squares (OLS) estimator has a small bias when the parameter to be estimated is non-stationary. Our analysis also highlights that the key motif driving the regret is that the LQR problem is in spirit a bandit problem with linear feedback and locally quadratic cost. This motif is more universal than the LQR problem itself, and therefore we believe our results should find wider application.\",\"PeriodicalId\":264598,\"journal\":{\"name\":\"Abstract Proceedings of the 2022 ACM SIGMETRICS/IFIP PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems\",\"volume\":\"54 4 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-06-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Abstract Proceedings of the 2022 ACM SIGMETRICS/IFIP PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3489048.3522649\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Abstract Proceedings of the 2022 ACM SIGMETRICS/IFIP PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3489048.3522649","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

我们考虑在有限视界T上控制一个线性二次调节器(LQR)系统的问题，该系统具有固定和已知的代价矩阵Q,R，但未知和非平稳的动力学At, Bt。动力学矩阵的序列可以是任意的，但具有总变分VT，假设为0 (T)并且控制器未知。假设所有t都有一个稳定但可能次优的控制器序列，我们提出了一种算法，该算法实现了Õ(VT2/5 T3/5)的最优动态后悔。在分段常数动态下，我们的算法实现了最优后悔值Õ(√ST)，其中S为开关数。该算法的关键是自适应非平稳检测策略，该策略基于最近开发的上下文多臂强盗问题的方法。我们还认为，对于LQR问题，非自适应遗忘(例如，重新启动或使用具有静态窗口大小的滑动窗口学习)可能不是后悔最优的，即使当窗口大小通过VT的知识进行最佳调整时也是如此。我们算法分析中的主要技术挑战是证明普通最小二乘(OLS)估计器在估计参数是非平稳时具有小偏差。我们的分析还强调了驱动后悔的关键主题是LQR问题在精神上是一个具有线性反馈和局部二次代价的强盗问题。这个基序比LQR问题本身更普遍，因此我们相信我们的结果应该有更广泛的应用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Dynamic Regret Minimization for Control of Non-stationary Linear Dynamical Systems

We consider the problem of controlling a Linear Quadratic Regulator (LQR) system over a finite horizon T with fixed and known cost matrices Q,R, but unknown and non-stationary dynamics At, Bt. The sequence of dynamics matrices can be arbitrary, but with a total variation, VT, assumed to be o(T) and unknown to the controller. Under the assumption that a sequence of stabilizing, but potentially sub-optimal controllers is available for all t, we present an algorithm that achieves the optimal dynamic regret of Õ(VT2/5 T3/5). With piecewise constant dynamics, our algorithm achieves the optimal regret of Õ(√ST) where S is the number of switches. The crux of our algorithm is an adaptive non-stationarity detection strategy, which builds on an approach recently developed for contextual Multi-armed Bandit problems. We also argue that non-adaptive forgetting (e.g., restarting or using sliding window learning with a static window size) may not be regret optimal for the LQR problem, even when the window size is optimally tuned with the knowledge of VT. The main technical challenge in the analysis of our algorithm is to prove that the ordinary least squares (OLS) estimator has a small bias when the parameter to be estimated is non-stationary. Our analysis also highlights that the key motif driving the regret is that the LQR problem is in spirit a bandit problem with linear feedback and locally quadratic cost. This motif is more universal than the LQR problem itself, and therefore we believe our results should find wider application.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Abstract Proceedings of the 2022 ACM SIGMETRICS/IFIP PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems

自引率

0.00%

发文量