非平稳线性动力系统控制的动态遗憾最小化

Yuwei Luo, Varun Gupta, M. Kolar
{"title":"非平稳线性动力系统控制的动态遗憾最小化","authors":"Yuwei Luo, Varun Gupta, M. Kolar","doi":"10.1145/3489048.3522649","DOIUrl":null,"url":null,"abstract":"We consider the problem of controlling a Linear Quadratic Regulator (LQR) system over a finite horizon T with fixed and known cost matrices Q,R, but unknown and non-stationary dynamics At, Bt. The sequence of dynamics matrices can be arbitrary, but with a total variation, VT, assumed to be o(T) and unknown to the controller. Under the assumption that a sequence of stabilizing, but potentially sub-optimal controllers is available for all t, we present an algorithm that achieves the optimal dynamic regret of Õ(VT2/5 T3/5). With piecewise constant dynamics, our algorithm achieves the optimal regret of Õ(√ST) where S is the number of switches. The crux of our algorithm is an adaptive non-stationarity detection strategy, which builds on an approach recently developed for contextual Multi-armed Bandit problems. We also argue that non-adaptive forgetting (e.g., restarting or using sliding window learning with a static window size) may not be regret optimal for the LQR problem, even when the window size is optimally tuned with the knowledge of VT. The main technical challenge in the analysis of our algorithm is to prove that the ordinary least squares (OLS) estimator has a small bias when the parameter to be estimated is non-stationary. Our analysis also highlights that the key motif driving the regret is that the LQR problem is in spirit a bandit problem with linear feedback and locally quadratic cost. This motif is more universal than the LQR problem itself, and therefore we believe our results should find wider application.","PeriodicalId":264598,"journal":{"name":"Abstract Proceedings of the 2022 ACM SIGMETRICS/IFIP PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems","volume":"54 4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Dynamic Regret Minimization for Control of Non-stationary Linear Dynamical Systems\",\"authors\":\"Yuwei Luo, Varun Gupta, M. Kolar\",\"doi\":\"10.1145/3489048.3522649\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We consider the problem of controlling a Linear Quadratic Regulator (LQR) system over a finite horizon T with fixed and known cost matrices Q,R, but unknown and non-stationary dynamics At, Bt. The sequence of dynamics matrices can be arbitrary, but with a total variation, VT, assumed to be o(T) and unknown to the controller. Under the assumption that a sequence of stabilizing, but potentially sub-optimal controllers is available for all t, we present an algorithm that achieves the optimal dynamic regret of Õ(VT2/5 T3/5). With piecewise constant dynamics, our algorithm achieves the optimal regret of Õ(√ST) where S is the number of switches. The crux of our algorithm is an adaptive non-stationarity detection strategy, which builds on an approach recently developed for contextual Multi-armed Bandit problems. We also argue that non-adaptive forgetting (e.g., restarting or using sliding window learning with a static window size) may not be regret optimal for the LQR problem, even when the window size is optimally tuned with the knowledge of VT. The main technical challenge in the analysis of our algorithm is to prove that the ordinary least squares (OLS) estimator has a small bias when the parameter to be estimated is non-stationary. Our analysis also highlights that the key motif driving the regret is that the LQR problem is in spirit a bandit problem with linear feedback and locally quadratic cost. This motif is more universal than the LQR problem itself, and therefore we believe our results should find wider application.\",\"PeriodicalId\":264598,\"journal\":{\"name\":\"Abstract Proceedings of the 2022 ACM SIGMETRICS/IFIP PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems\",\"volume\":\"54 4 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-06-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Abstract Proceedings of the 2022 ACM SIGMETRICS/IFIP PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3489048.3522649\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Abstract Proceedings of the 2022 ACM SIGMETRICS/IFIP PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3489048.3522649","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

我们考虑在有限视界T上控制一个线性二次调节器(LQR)系统的问题,该系统具有固定和已知的代价矩阵Q,R,但未知和非平稳的动力学At, Bt。动力学矩阵的序列可以是任意的,但具有总变分VT,假设为0 (T)并且控制器未知。假设所有t都有一个稳定但可能次优的控制器序列,我们提出了一种算法,该算法实现了Õ(VT2/5 T3/5)的最优动态后悔。在分段常数动态下,我们的算法实现了最优后悔值Õ(√ST),其中S为开关数。该算法的关键是自适应非平稳检测策略,该策略基于最近开发的上下文多臂强盗问题的方法。我们还认为,对于LQR问题,非自适应遗忘(例如,重新启动或使用具有静态窗口大小的滑动窗口学习)可能不是后悔最优的,即使当窗口大小通过VT的知识进行最佳调整时也是如此。我们算法分析中的主要技术挑战是证明普通最小二乘(OLS)估计器在估计参数是非平稳时具有小偏差。我们的分析还强调了驱动后悔的关键主题是LQR问题在精神上是一个具有线性反馈和局部二次代价的强盗问题。这个基序比LQR问题本身更普遍,因此我们相信我们的结果应该有更广泛的应用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Dynamic Regret Minimization for Control of Non-stationary Linear Dynamical Systems
We consider the problem of controlling a Linear Quadratic Regulator (LQR) system over a finite horizon T with fixed and known cost matrices Q,R, but unknown and non-stationary dynamics At, Bt. The sequence of dynamics matrices can be arbitrary, but with a total variation, VT, assumed to be o(T) and unknown to the controller. Under the assumption that a sequence of stabilizing, but potentially sub-optimal controllers is available for all t, we present an algorithm that achieves the optimal dynamic regret of Õ(VT2/5 T3/5). With piecewise constant dynamics, our algorithm achieves the optimal regret of Õ(√ST) where S is the number of switches. The crux of our algorithm is an adaptive non-stationarity detection strategy, which builds on an approach recently developed for contextual Multi-armed Bandit problems. We also argue that non-adaptive forgetting (e.g., restarting or using sliding window learning with a static window size) may not be regret optimal for the LQR problem, even when the window size is optimally tuned with the knowledge of VT. The main technical challenge in the analysis of our algorithm is to prove that the ordinary least squares (OLS) estimator has a small bias when the parameter to be estimated is non-stationary. Our analysis also highlights that the key motif driving the regret is that the LQR problem is in spirit a bandit problem with linear feedback and locally quadratic cost. This motif is more universal than the LQR problem itself, and therefore we believe our results should find wider application.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信