Conference on Learning for Dynamics & Control最新文献

筛选
英文 中文
Agile Catching with Whole-Body MPC and Blackbox Policy Learning 敏捷捕获与全身MPC和黑盒策略学习
Conference on Learning for Dynamics & Control Pub Date : 2023-06-14 DOI: 10.48550/arXiv.2306.08205
Saminda Abeyruwan, A. Bewley, Nicholas M. Boffi, K. Choromanski, David B. D'Ambrosio, Deepali Jain, P. Sanketi, A. Shankar, Vikas Sindhwani, Sumeet Singh, J. Slotine, Stephen Tu
{"title":"Agile Catching with Whole-Body MPC and Blackbox Policy Learning","authors":"Saminda Abeyruwan, A. Bewley, Nicholas M. Boffi, K. Choromanski, David B. D'Ambrosio, Deepali Jain, P. Sanketi, A. Shankar, Vikas Sindhwani, Sumeet Singh, J. Slotine, Stephen Tu","doi":"10.48550/arXiv.2306.08205","DOIUrl":"https://doi.org/10.48550/arXiv.2306.08205","url":null,"abstract":"We address a benchmark task in agile robotics: catching objects thrown at high-speed. This is a challenging task that involves tracking, intercepting, and cradling a thrown object with access only to visual observations of the object and the proprioceptive state of the robot, all within a fraction of a second. We present the relative merits of two fundamentally different solution strategies: (i) Model Predictive Control using accelerated constrained trajectory optimization, and (ii) Reinforcement Learning using zeroth-order optimization. We provide insights into various performance trade-offs including sample efficiency, sim-to-real transfer, robustness to distribution shifts, and whole-body multimodality via extensive on-hardware experiments. We conclude with proposals on fusing\"classical\"and\"learning-based\"techniques for agile robot control. Videos of our experiments may be found at https://sites.google.com/view/agile-catching","PeriodicalId":268449,"journal":{"name":"Conference on Learning for Dynamics & Control","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132272534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Time Dependent Inverse Optimal Control using Trigonometric Basis Functions 基于三角基函数的时变逆最优控制
Conference on Learning for Dynamics & Control Pub Date : 2023-06-05 DOI: 10.48550/arXiv.2306.02820
Rahel Rickenbach, Elena Arcari, M. Zeilinger
{"title":"Time Dependent Inverse Optimal Control using Trigonometric Basis Functions","authors":"Rahel Rickenbach, Elena Arcari, M. Zeilinger","doi":"10.48550/arXiv.2306.02820","DOIUrl":"https://doi.org/10.48550/arXiv.2306.02820","url":null,"abstract":"The choice of objective is critical for the performance of an optimal controller. When control requirements vary during operation, e.g. due to changes in the environment with which the system is interacting, these variations should be reflected in the cost function. In this paper we consider the problem of identifying a time dependent cost function from given trajectories. We propose a strategy for explicitly representing time dependency in the cost function, i.e. decomposing it into the product of an unknown time dependent parameter vector and a known state and input dependent vector, modelling the former via a linear combination of trigonometric basis functions. These are incorporated within an inverse optimal control framework that uses the Karush-Kuhn-Tucker (KKT) conditions for ensuring optimality, and allows for formulating an optimization problem with respect to a finite set of basis function hyperparameters. Results are shown for two systems in simulation and evaluated against state-of-the-art approaches.","PeriodicalId":268449,"journal":{"name":"Conference on Learning for Dynamics & Control","volume":"89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127570774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Provably Efficient Generalized Lagrangian Policy Optimization for Safe Multi-Agent Reinforcement Learning 安全多智能体强化学习的可证明高效广义拉格朗日策略优化
Conference on Learning for Dynamics & Control Pub Date : 2023-05-31 DOI: 10.48550/arXiv.2306.00212
Dongsheng Ding, Xiaohan Wei, Zhuoran Yang, Zhaoran Wang, Mihailo R. Jovanovi'c
{"title":"Provably Efficient Generalized Lagrangian Policy Optimization for Safe Multi-Agent Reinforcement Learning","authors":"Dongsheng Ding, Xiaohan Wei, Zhuoran Yang, Zhaoran Wang, Mihailo R. Jovanovi'c","doi":"10.48550/arXiv.2306.00212","DOIUrl":"https://doi.org/10.48550/arXiv.2306.00212","url":null,"abstract":"We examine online safe multi-agent reinforcement learning using constrained Markov games in which agents compete by maximizing their expected total rewards under a constraint on expected total utilities. Our focus is confined to an episodic two-player zero-sum constrained Markov game with independent transition functions that are unknown to agents, adversarial reward functions, and stochastic utility functions. For such a Markov game, we employ an approach based on the occupancy measure to formulate it as an online constrained saddle-point problem with an explicit constraint. We extend the Lagrange multiplier method in constrained optimization to handle the constraint by creating a generalized Lagrangian with minimax decision primal variables and a dual variable. Next, we develop an upper confidence reinforcement learning algorithm to solve this Lagrangian problem while balancing exploration and exploitation. Our algorithm updates the minimax decision primal variables via online mirror descent and the dual variable via projected gradient step and we prove that it enjoys sublinear rate $ O((|X|+|Y|) L sqrt{T(|A|+|B|)}))$ for both regret and constraint violation after playing $T$ episodes of the game. Here, $L$ is the horizon of each episode, $(|X|,|A|)$ and $(|Y|,|B|)$ are the state/action space sizes of the min-player and the max-player, respectively. To the best of our knowledge, we provide the first provably efficient online safe reinforcement learning algorithm in constrained Markov games.","PeriodicalId":268449,"journal":{"name":"Conference on Learning for Dynamics & Control","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134306833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Black-Box vs. Gray-Box: A Case Study on Learning Table Tennis Ball Trajectory Prediction with Spin and Impacts 黑盒与灰盒:基于旋转和冲击的乒乓球运动轨迹预测学习案例研究
Conference on Learning for Dynamics & Control Pub Date : 2023-05-24 DOI: 10.48550/arXiv.2305.15189
Jan Achterhold, Philip Tobuschat, Hao Ma, Dieter Buechler, Michael Muehlebach, Joerg Stueckler
{"title":"Black-Box vs. Gray-Box: A Case Study on Learning Table Tennis Ball Trajectory Prediction with Spin and Impacts","authors":"Jan Achterhold, Philip Tobuschat, Hao Ma, Dieter Buechler, Michael Muehlebach, Joerg Stueckler","doi":"10.48550/arXiv.2305.15189","DOIUrl":"https://doi.org/10.48550/arXiv.2305.15189","url":null,"abstract":"In this paper, we present a method for table tennis ball trajectory filtering and prediction. Our gray-box approach builds on a physical model. At the same time, we use data to learn parameters of the dynamics model, of an extended Kalman filter, and of a neural model that infers the ball's initial condition. We demonstrate superior prediction performance of our approach over two black-box approaches, which are not supplied with physical prior knowledge. We demonstrate that initializing the spin from parameters of the ball launcher using a neural network drastically improves long-time prediction performance over estimating the spin purely from measured ball positions. An accurate prediction of the ball trajectory is crucial for successful returns. We therefore evaluate the return performance with a pneumatic artificial muscular robot and achieve a return rate of 29/30 (97.7%).","PeriodicalId":268449,"journal":{"name":"Conference on Learning for Dynamics & Control","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129488281","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Model-based Validation as Probabilistic Inference 基于模型的验证作为概率推理
Conference on Learning for Dynamics & Control Pub Date : 2023-05-17 DOI: 10.48550/arXiv.2305.09930
Harrison Delecki, Anthony Corso, Mykel J. Kochenderfer
{"title":"Model-based Validation as Probabilistic Inference","authors":"Harrison Delecki, Anthony Corso, Mykel J. Kochenderfer","doi":"10.48550/arXiv.2305.09930","DOIUrl":"https://doi.org/10.48550/arXiv.2305.09930","url":null,"abstract":"Estimating the distribution over failures is a key step in validating autonomous systems. Existing approaches focus on finding failures for a small range of initial conditions or make restrictive assumptions about the properties of the system under test. We frame estimating the distribution over failure trajectories for sequential systems as Bayesian inference. Our model-based approach represents the distribution over failure trajectories using rollouts of system dynamics and computes trajectory gradients using automatic differentiation. Our approach is demonstrated in an inverted pendulum control system, an autonomous vehicle driving scenario, and a partially observable lunar lander. Sampling is performed using an off-the-shelf implementation of Hamiltonian Monte Carlo with multiple chains to capture multimodality and gradient smoothing for safe trajectories. In all experiments, we observed improvements in sample efficiency and parameter space coverage compared to black-box baseline approaches. This work is open sourced.","PeriodicalId":268449,"journal":{"name":"Conference on Learning for Dynamics & Control","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128146345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Toward Multi-Agent Reinforcement Learning for Distributed Event-Triggered Control 分布式事件触发控制的多智能体强化学习研究
Conference on Learning for Dynamics & Control Pub Date : 2023-05-15 DOI: 10.48550/arXiv.2305.08723
Lukas Kesper, Sebastian Trimpe, Dominik Baumann
{"title":"Toward Multi-Agent Reinforcement Learning for Distributed Event-Triggered Control","authors":"Lukas Kesper, Sebastian Trimpe, Dominik Baumann","doi":"10.48550/arXiv.2305.08723","DOIUrl":"https://doi.org/10.48550/arXiv.2305.08723","url":null,"abstract":"Event-triggered communication and control provide high control performance in networked control systems without overloading the communication network. However, most approaches require precise mathematical models of the system dynamics, which may not always be available. Model-free learning of communication and control policies provides an alternative. Nevertheless, existing methods typically consider single-agent settings. This paper proposes a model-free reinforcement learning algorithm that jointly learns resource-aware communication and control policies for distributed multi-agent systems from data. We evaluate the algorithm in a high-dimensional and nonlinear simulation example and discuss promising avenues for further research.","PeriodicalId":268449,"journal":{"name":"Conference on Learning for Dynamics & Control","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125473694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Equilibria of Fully Decentralized Learning in Networked Systems 网络系统中完全分散学习的均衡
Conference on Learning for Dynamics & Control Pub Date : 2023-05-15 DOI: 10.48550/arXiv.2305.09002
Yan Jiang, Wenqi Cui, Baosen Zhang, Jorge Cort'es
{"title":"Equilibria of Fully Decentralized Learning in Networked Systems","authors":"Yan Jiang, Wenqi Cui, Baosen Zhang, Jorge Cort'es","doi":"10.48550/arXiv.2305.09002","DOIUrl":"https://doi.org/10.48550/arXiv.2305.09002","url":null,"abstract":"Existing settings of decentralized learning either require players to have full information or the system to have certain special structure that may be hard to check and hinder their applicability to practical systems. To overcome this, we identify a structure that is simple to check for linear dynamical system, where each player learns in a fully decentralized fashion to minimize its cost. We first establish the existence of pure strategy Nash equilibria in the resulting noncooperative game. We then conjecture that the Nash equilibrium is unique provided that the system satisfies an additional requirement on its structure. We also introduce a decentralized mechanism based on projected gradient descent to have agents learn the Nash equilibrium. Simulations on a $5$-player game validate our results.","PeriodicalId":268449,"journal":{"name":"Conference on Learning for Dynamics & Control","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131317997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Template-Based Piecewise Affine Regression 基于模板的分段仿射回归
Conference on Learning for Dynamics & Control Pub Date : 2023-05-15 DOI: 10.48550/arXiv.2305.08686
Guillaume O. Berger, S. Sankaranarayanan
{"title":"Template-Based Piecewise Affine Regression","authors":"Guillaume O. Berger, S. Sankaranarayanan","doi":"10.48550/arXiv.2305.08686","DOIUrl":"https://doi.org/10.48550/arXiv.2305.08686","url":null,"abstract":"We investigate the problem of fitting piecewise affine functions (PWA) to data. Our algorithm divides the input domain into finitely many polyhedral regions whose shapes are specified using a user-defined template such that the data points in each region are fit by an affine function within a desired error bound. We first prove that this problem is NP-hard. Next, we present a top-down algorithm that considers subsets of the overall data set in a systematic manner, trying to fit an affine function for each subset using linear regression. If regression fails on a subset, we extract a minimal set of points that led to a failure in order to split the original index set into smaller subsets. Using a combination of this top-down scheme and a set covering algorithm, we derive an overall approach that is optimal in terms of the number of pieces of the resulting PWA model. We demonstrate our approach on two numerical examples that include PWA approximations of a widely used nonlinear insulin--glucose regulation model and a double inverted pendulum with soft contacts.","PeriodicalId":268449,"journal":{"name":"Conference on Learning for Dynamics & Control","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129173727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Generalizable Physics-informed Learning Framework for Risk Probability Estimation 风险概率估计的可推广物理知识学习框架
Conference on Learning for Dynamics & Control Pub Date : 2023-05-10 DOI: 10.48550/arXiv.2305.06432
Zhuoyuan Wang, Yorie Nakahira
{"title":"A Generalizable Physics-informed Learning Framework for Risk Probability Estimation","authors":"Zhuoyuan Wang, Yorie Nakahira","doi":"10.48550/arXiv.2305.06432","DOIUrl":"https://doi.org/10.48550/arXiv.2305.06432","url":null,"abstract":"Accurate estimates of long-term risk probabilities and their gradients are critical for many stochastic safe control methods. However, computing such risk probabilities in real-time and in unseen or changing environments is challenging. Monte Carlo (MC) methods cannot accurately evaluate the probabilities and their gradients as an infinitesimal devisor can amplify the sampling noise. In this paper, we develop an efficient method to evaluate the probabilities of long-term risk and their gradients. The proposed method exploits the fact that long-term risk probability satisfies certain partial differential equations (PDEs), which characterize the neighboring relations between the probabilities, to integrate MC methods and physics-informed neural networks. We provide theoretical guarantees of the estimation error given certain choices of training configurations. Numerical results show the proposed method has better sample efficiency, generalizes well to unseen regions, and can adapt to systems with changing parameters. The proposed method can also accurately estimate the gradients of risk probabilities, which enables first- and second-order techniques on risk probabilities to be used for learning and control.","PeriodicalId":268449,"journal":{"name":"Conference on Learning for Dynamics & Control","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128929574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Impact of the Geometric Properties of the Constraint Set in Safe Optimization with Bandit Feedback 约束集几何性质对强盗反馈安全优化的影响
Conference on Learning for Dynamics & Control Pub Date : 2023-05-01 DOI: 10.48550/arXiv.2305.00889
Spencer Hutchinson, Berkay Turan, M. Alizadeh
{"title":"The Impact of the Geometric Properties of the Constraint Set in Safe Optimization with Bandit Feedback","authors":"Spencer Hutchinson, Berkay Turan, M. Alizadeh","doi":"10.48550/arXiv.2305.00889","DOIUrl":"https://doi.org/10.48550/arXiv.2305.00889","url":null,"abstract":"We consider a safe optimization problem with bandit feedback in which an agent sequentially chooses actions and observes responses from the environment, with the goal of maximizing an arbitrary function of the response while respecting stage-wise constraints. We propose an algorithm for this problem, and study how the geometric properties of the constraint set impact the regret of the algorithm. In order to do so, we introduce the notion of the sharpness of a particular constraint set, which characterizes the difficulty of performing learning within the constraint set in an uncertain setting. This concept of sharpness allows us to identify the class of constraint sets for which the proposed algorithm is guaranteed to enjoy sublinear regret. Simulation results for this algorithm support the sublinear regret bound and provide empirical evidence that the sharpness of the constraint set impacts the performance of the algorithm.","PeriodicalId":268449,"journal":{"name":"Conference on Learning for Dynamics & Control","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130736905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信