Conference on Learning for Dynamics & Control最新文献

筛选
英文 中文
Regret Analysis of Online LQR Control via Trajectory Prediction and Tracking: Extended Version 基于轨迹预测与跟踪的在线LQR控制的后悔分析:扩展版
Conference on Learning for Dynamics & Control Pub Date : 2023-02-21 DOI: 10.48550/arXiv.2302.10411
Yitian Chen, Timothy L. Molloy, T. Summers, I. Shames
{"title":"Regret Analysis of Online LQR Control via Trajectory Prediction and Tracking: Extended Version","authors":"Yitian Chen, Timothy L. Molloy, T. Summers, I. Shames","doi":"10.48550/arXiv.2302.10411","DOIUrl":"https://doi.org/10.48550/arXiv.2302.10411","url":null,"abstract":"In this paper, we propose and analyze a new method for online linear quadratic regulator (LQR) control with a priori unknown time-varying cost matrices. The cost matrices are revealed sequentially with the potential for future values to be previewed over a short window. Our novel method involves using the available cost matrices to predict the optimal trajectory, and a tracking controller to drive the system towards it. We adopted the notion of dynamic regret to measure the performance of this proposed online LQR control method, with our main result being that the (dynamic) regret of our method is upper bounded by a constant. Moreover, the regret upper bound decays exponentially with the preview window length, and is extendable to systems with disturbances. We show in simulations that our proposed method offers improved performance compared to other previously proposed online LQR methods.","PeriodicalId":268449,"journal":{"name":"Conference on Learning for Dynamics & Control","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128669982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Modified Policy Iteration for Exponential Cost Risk Sensitive MDPs 指数成本风险敏感mdp的改进策略迭代
Conference on Learning for Dynamics & Control Pub Date : 2023-02-08 DOI: 10.48550/arXiv.2302.03811
Yashaswini Murthy, Mehrdad Moharrami, R. Srikant
{"title":"Modified Policy Iteration for Exponential Cost Risk Sensitive MDPs","authors":"Yashaswini Murthy, Mehrdad Moharrami, R. Srikant","doi":"10.48550/arXiv.2302.03811","DOIUrl":"https://doi.org/10.48550/arXiv.2302.03811","url":null,"abstract":"Modified policy iteration (MPI) also known as optimistic policy iteration is at the core of many reinforcement learning algorithms. It works by combining elements of policy iteration and value iteration. The convergence of MPI has been well studied in the case of discounted and average-cost MDPs. In this work, we consider the exponential cost risk-sensitive MDP formulation, which is known to provide some robustness to model parameters. Although policy iteration and value iteration have been well studied in the context of risk sensitive MDPs, modified policy iteration is relatively unexplored. We provide the first proof that MPI also converges for the risk-sensitive problem in the case of finite state and action spaces. Since the exponential cost formulation deals with the multiplicative Bellman equation, our main contribution is a convergence proof which is quite different than existing results for discounted and risk-neutral average-cost problems. The proof of approximate modified policy iteration for risk sensitive MDPs is also provided in the appendix.","PeriodicalId":268449,"journal":{"name":"Conference on Learning for Dynamics & Control","volume":"440 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123839237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Certified Invertibility in Neural Networks via Mixed-Integer Programming 基于混合整数规划的神经网络的证明可逆性
Conference on Learning for Dynamics & Control Pub Date : 2023-01-27 DOI: 10.48550/arXiv.2301.11783
Tianqi Cui, Tom S. Bertalan, George J. Pappas, M. Morari, I. Kevrekidis, Mahyar Fazlyab
{"title":"Certified Invertibility in Neural Networks via Mixed-Integer Programming","authors":"Tianqi Cui, Tom S. Bertalan, George J. Pappas, M. Morari, I. Kevrekidis, Mahyar Fazlyab","doi":"10.48550/arXiv.2301.11783","DOIUrl":"https://doi.org/10.48550/arXiv.2301.11783","url":null,"abstract":"Neural networks are known to be vulnerable to adversarial attacks, which are small, imperceptible perturbations that can significantly alter the network's output. Conversely, there may exist large, meaningful perturbations that do not affect the network's decision (excessive invariance). In our research, we investigate this latter phenomenon in two contexts: (a) discrete-time dynamical system identification, and (b) the calibration of a neural network's output to that of another network. We examine noninvertibility through the lens of mathematical optimization, where the global solution measures the ``safety\"of the network predictions by their distance from the non-invertibility boundary. We formulate mixed-integer programs (MIPs) for ReLU networks and $L_p$ norms ($p=1,2,infty$) that apply to neural network approximators of dynamical systems. We also discuss how our findings can be useful for invertibility certification in transformations between neural networks, e.g. between different levels of network pruning.","PeriodicalId":268449,"journal":{"name":"Conference on Learning for Dynamics & Control","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128823767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Online switching control with stability and regret guarantees 在线切换控制,具有稳定性和遗憾保证
Conference on Learning for Dynamics & Control Pub Date : 2023-01-20 DOI: 10.48550/arXiv.2301.08445
Yingying Li, James A. Preiss, Na Li, Yiheng Lin, A. Wierman, J. Shamma
{"title":"Online switching control with stability and regret guarantees","authors":"Yingying Li, James A. Preiss, Na Li, Yiheng Lin, A. Wierman, J. Shamma","doi":"10.48550/arXiv.2301.08445","DOIUrl":"https://doi.org/10.48550/arXiv.2301.08445","url":null,"abstract":"This paper considers online switching control with a finite candidate controller pool, an unknown dynamical system, and unknown cost functions. The candidate controllers can be unstabilizing policies. We only require at least one candidate controller to satisfy certain stability properties, but we do not know which one is stabilizing. We design an online algorithm that guarantees finite-gain stability throughout the duration of its execution. We also provide a sublinear policy regret guarantee compared with the optimal stabilizing candidate controller. Lastly, we numerically test our algorithm on quadrotor planar flights and compare it with a classical switching control algorithm, falsification-based switching, and a classical multi-armed bandit algorithm, Exp3 with batches.","PeriodicalId":268449,"journal":{"name":"Conference on Learning for Dynamics & Control","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114656214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Can Direct Latent Model Learning Solve Linear Quadratic Gaussian Control? 直接潜模型学习能解决线性二次高斯控制吗?
Conference on Learning for Dynamics & Control Pub Date : 2022-12-30 DOI: 10.48550/arXiv.2212.14511
Yi Tian, K. Zhang, Russ Tedrake, S. Sra
{"title":"Can Direct Latent Model Learning Solve Linear Quadratic Gaussian Control?","authors":"Yi Tian, K. Zhang, Russ Tedrake, S. Sra","doi":"10.48550/arXiv.2212.14511","DOIUrl":"https://doi.org/10.48550/arXiv.2212.14511","url":null,"abstract":"We study the task of learning state representations from potentially high-dimensional observations, with the goal of controlling an unknown partially observable system. We pursue a direct latent model learning approach, where a dynamic model in some latent state space is learned by predicting quantities directly related to planning (e.g., costs) without reconstructing the observations. In particular, we focus on an intuitive cost-driven state representation learning method for solving Linear Quadratic Gaussian (LQG) control, one of the most fundamental partially observable control problems. As our main results, we establish finite-sample guarantees of finding a near-optimal state representation function and a near-optimal controller using the directly learned latent model. To the best of our knowledge, despite various empirical successes, prior to this work it was unclear if such a cost-driven latent model learner enjoys finite-sample guarantees. Our work underscores the value of predicting multi-step costs, an idea that is key to our theory, and notably also an idea that is known to be empirically valuable for learning state representations.","PeriodicalId":268449,"journal":{"name":"Conference on Learning for Dynamics & Control","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130463164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Data-driven Stochastic Output-Feedback Predictive Control: Recursive Feasibility through Interpolated Initial Conditions 数据驱动的随机输出反馈预测控制:通过插值初始条件的递归可行性
Conference on Learning for Dynamics & Control Pub Date : 2022-12-15 DOI: 10.48550/arXiv.2212.07661
Guanru Pan, Ruchuan Ou, T. Faulwasser
{"title":"Data-driven Stochastic Output-Feedback Predictive Control: Recursive Feasibility through Interpolated Initial Conditions","authors":"Guanru Pan, Ruchuan Ou, T. Faulwasser","doi":"10.48550/arXiv.2212.07661","DOIUrl":"https://doi.org/10.48550/arXiv.2212.07661","url":null,"abstract":"The paper investigates data-driven output-feedback predictive control of linear systems subject to stochastic disturbances. The scheme relies on the recursive solution of a suitable data-driven reformulation of a stochastic Optimal Control Problem (OCP), which allows for forward prediction and optimization of statistical distributions of inputs and outputs. Our approach avoids the use of parametric system models. Instead it is based on previously recorded data using a recently proposed stochastic variant of Willems' fundamental lemma. The stochastic variant of the lemma is applicable to a large class of linear dynamics subject to stochastic disturbances of Gaussian and non-Gaussian nature. To ensure recursive feasibility, the initial condition of the OCP -- which consists of information about past inputs and outputs -- is considered as an extra decision variable of the OCP. We provide sufficient conditions for recursive feasibility and closed-loop practical stability of the proposed scheme as well as performance bounds. Finally, a numerical example illustrates the efficacy and closed-loop properties of the proposed scheme.","PeriodicalId":268449,"journal":{"name":"Conference on Learning for Dynamics & Control","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122146886","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Hybrid Multi-agent Deep Reinforcement Learning for Autonomous Mobility on Demand Systems 基于混合多智能体的随需移动系统深度强化学习
Conference on Learning for Dynamics & Control Pub Date : 2022-12-14 DOI: 10.48550/arXiv.2212.07313
Tobias Enders, James Harrison, M. Pavone, Maximilian Schiffer
{"title":"Hybrid Multi-agent Deep Reinforcement Learning for Autonomous Mobility on Demand Systems","authors":"Tobias Enders, James Harrison, M. Pavone, Maximilian Schiffer","doi":"10.48550/arXiv.2212.07313","DOIUrl":"https://doi.org/10.48550/arXiv.2212.07313","url":null,"abstract":"We consider the sequential decision-making problem of making proactive request assignment and rejection decisions for a profit-maximizing operator of an autonomous mobility on demand system. We formalize this problem as a Markov decision process and propose a novel combination of multi-agent Soft Actor-Critic and weighted bipartite matching to obtain an anticipative control policy. Thereby, we factorize the operator's otherwise intractable action space, but still obtain a globally coordinated decision. Experiments based on real-world taxi data show that our method outperforms state of the art benchmarks with respect to performance, stability, and computational tractability.","PeriodicalId":268449,"journal":{"name":"Conference on Learning for Dynamics & Control","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123995434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Learning Disturbances Online for Risk-Aware Control: Risk-Aware Flight with Less Than One Minute of Data 在线学习干扰风险感知控制:风险感知飞行少于一分钟的数据
Conference on Learning for Dynamics & Control Pub Date : 2022-12-12 DOI: 10.48550/arXiv.2212.06253
Prithvi Akella, Skylar X. Wei, J. Burdick, A. Ames
{"title":"Learning Disturbances Online for Risk-Aware Control: Risk-Aware Flight with Less Than One Minute of Data","authors":"Prithvi Akella, Skylar X. Wei, J. Burdick, A. Ames","doi":"10.48550/arXiv.2212.06253","DOIUrl":"https://doi.org/10.48550/arXiv.2212.06253","url":null,"abstract":"Recent advances in safety-critical risk-aware control are predicated on apriori knowledge of the disturbances a system might face. This paper proposes a method to efficiently learn these disturbances online, in a risk-aware context. First, we introduce the concept of a Surface-at-Risk, a risk measure for stochastic processes that extends Value-at-Risk -- a commonly utilized risk measure in the risk-aware controls community. Second, we model the norm of the state discrepancy between the model and the true system evolution as a scalar-valued stochastic process and determine an upper bound to its Surface-at-Risk via Gaussian Process Regression. Third, we provide theoretical results on the accuracy of our fitted surface subject to mild assumptions that are verifiable with respect to the data sets collected during system operation. Finally, we experimentally verify our procedure by augmenting a drone's controller and highlight performance increases achieved via our risk-aware approach after collecting less than a minute of operating data.","PeriodicalId":268449,"journal":{"name":"Conference on Learning for Dynamics & Control","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114484461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Targeted Adversarial Attacks against Neural Network Trajectory Predictors 针对神经网络轨迹预测器的针对性对抗性攻击
Conference on Learning for Dynamics & Control Pub Date : 2022-12-08 DOI: 10.48550/arXiv.2212.04138
Kai Liang Tan, J. Wang, Y. Kantaros
{"title":"Targeted Adversarial Attacks against Neural Network Trajectory Predictors","authors":"Kai Liang Tan, J. Wang, Y. Kantaros","doi":"10.48550/arXiv.2212.04138","DOIUrl":"https://doi.org/10.48550/arXiv.2212.04138","url":null,"abstract":"Trajectory prediction is an integral component of modern autonomous systems as it allows for envisioning future intentions of nearby moving agents. Due to the lack of other agents' dynamics and control policies, deep neural network (DNN) models are often employed for trajectory forecasting tasks. Although there exists an extensive literature on improving the accuracy of these models, there is a very limited number of works studying their robustness against adversarially crafted input trajectories. To bridge this gap, in this paper, we propose a targeted adversarial attack against DNN models for trajectory forecasting tasks. We call the proposed attack TA4TP for Targeted adversarial Attack for Trajectory Prediction. Our approach generates adversarial input trajectories that are capable of fooling DNN models into predicting user-specified target/desired trajectories. Our attack relies on solving a nonlinear constrained optimization problem where the objective function captures the deviation of the predicted trajectory from a target one while the constraints model physical requirements that the adversarial input should satisfy. The latter ensures that the inputs look natural and they are safe to execute (e.g., they are close to nominal inputs and away from obstacles). We demonstrate the effectiveness of TA4TP on two state-of-the-art DNN models and two datasets. To the best of our knowledge, we propose the first targeted adversarial attack against DNN models used for trajectory forecasting.","PeriodicalId":268449,"journal":{"name":"Conference on Learning for Dynamics & Control","volume":"112 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115571068","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Concentration Phenomenon for Random Dynamical Systems: An Operator Theoretic Approach 随机动力系统的集中现象:一种算子理论方法
Conference on Learning for Dynamics & Control Pub Date : 2022-12-07 DOI: 10.48550/arXiv.2212.03670
Muhammad Naeem
{"title":"Concentration Phenomenon for Random Dynamical Systems: An Operator Theoretic Approach","authors":"Muhammad Naeem","doi":"10.48550/arXiv.2212.03670","DOIUrl":"https://doi.org/10.48550/arXiv.2212.03670","url":null,"abstract":"Via operator theoretic methods, we formalize the concentration phenomenon for a given observable `$r$' of a discrete time Markov chain with `$mu_{pi}$' as invariant ergodic measure, possibly having support on an unbounded state space. The main contribution of this paper is circumventing tedious probabilistic methods with a study of a composition of the Markov transition operator $P$ followed by a multiplication operator defined by $e^{r}$. It turns out that even if the observable/ reward function is unbounded, but for some for some $q>2$, $|e^{r}|_{q rightarrow 2} propto expbig(mu_{pi}(r) +frac{2q}{q-2}big) $ and $P$ is hyperbounded with norm control $|P|_{2 rightarrow q }<e^{frac{1}{2}[frac{1}{2}-frac{1}{q}]}$, sharp non-asymptotic concentration bounds follow. emph{Transport-entropy} inequality ensures the aforementioned upper bound on multiplication operator for all $q>2$. The role of emph{reversibility} in concentration phenomenon is demystified. These results are particularly useful for the reinforcement learning and controls communities as they allow for concentration inequalities w.r.t standard unbounded obersvables/reward functions where exact knowledge of the system is not available, let alone the reversibility of stationary measure.","PeriodicalId":268449,"journal":{"name":"Conference on Learning for Dynamics & Control","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125311900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信