Conference on Learning for Dynamics & Control最新文献

Agile Catching with Whole-Body MPC and Blackbox Policy Learning 敏捷捕获与全身MPC和黑盒策略学习

Conference on Learning for Dynamics & Control Pub Date : 2023-06-14 DOI: 10.48550/arXiv.2306.08205

Saminda Abeyruwan, A. Bewley, Nicholas M. Boffi, K. Choromanski, David B. D'Ambrosio, Deepali Jain, P. Sanketi, A. Shankar, Vikas Sindhwani, Sumeet Singh, J. Slotine, Stephen Tu

引用次数: 1

Time Dependent Inverse Optimal Control using Trigonometric Basis Functions 基于三角基函数的时变逆最优控制

Conference on Learning for Dynamics & Control Pub Date : 2023-06-05 DOI: 10.48550/arXiv.2306.02820

Rahel Rickenbach, Elena Arcari, M. Zeilinger

引用次数: 0

Provably Efficient Generalized Lagrangian Policy Optimization for Safe Multi-Agent Reinforcement Learning 安全多智能体强化学习的可证明高效广义拉格朗日策略优化

Conference on Learning for Dynamics & Control Pub Date : 2023-05-31 DOI: 10.48550/arXiv.2306.00212

Dongsheng Ding, Xiaohan Wei, Zhuoran Yang, Zhaoran Wang, Mihailo R. Jovanovi'c

{"title":"Provably Efficient Generalized Lagrangian Policy Optimization for Safe Multi-Agent Reinforcement Learning","authors":"Dongsheng Ding, Xiaohan Wei, Zhuoran Yang, Zhaoran Wang, Mihailo R. Jovanovi'c","doi":"10.48550/arXiv.2306.00212","DOIUrl":"https://doi.org/10.48550/arXiv.2306.00212","url":null,"abstract":"We examine online safe multi-agent reinforcement learning using constrained Markov games in which agents compete by maximizing their expected total rewards under a constraint on expected total utilities. Our focus is confined to an episodic two-player zero-sum constrained Markov game with independent transition functions that are unknown to agents, adversarial reward functions, and stochastic utility functions. For such a Markov game, we employ an approach based on the occupancy measure to formulate it as an online constrained saddle-point problem with an explicit constraint. We extend the Lagrange multiplier method in constrained optimization to handle the constraint by creating a generalized Lagrangian with minimax decision primal variables and a dual variable. Next, we develop an upper confidence reinforcement learning algorithm to solve this Lagrangian problem while balancing exploration and exploitation. Our algorithm updates the minimax decision primal variables via online mirror descent and the dual variable via projected gradient step and we prove that it enjoys sublinear rate $ O((|X|+|Y|) L sqrt{T(|A|+|B|)}))$ for both regret and constraint violation after playing $T$ episodes of the game. Here, $L$ is the horizon of each episode, $(|X|,|A|)$ and $(|Y|,|B|)$ are the state/action space sizes of the min-player and the max-player, respectively. To the best of our knowledge, we provide the first provably efficient online safe reinforcement learning algorithm in constrained Markov games.","PeriodicalId":268449,"journal":{"name":"Conference on Learning for Dynamics & Control","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134306833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Black-Box vs. Gray-Box: A Case Study on Learning Table Tennis Ball Trajectory Prediction with Spin and Impacts 黑盒与灰盒:基于旋转和冲击的乒乓球运动轨迹预测学习案例研究

Conference on Learning for Dynamics & Control Pub Date : 2023-05-24 DOI: 10.48550/arXiv.2305.15189

Jan Achterhold, Philip Tobuschat, Hao Ma, Dieter Buechler, Michael Muehlebach, Joerg Stueckler

引用次数: 0

Model-based Validation as Probabilistic Inference 基于模型的验证作为概率推理

Conference on Learning for Dynamics & Control Pub Date : 2023-05-17 DOI: 10.48550/arXiv.2305.09930

Harrison Delecki, Anthony Corso, Mykel J. Kochenderfer

引用次数: 0

Toward Multi-Agent Reinforcement Learning for Distributed Event-Triggered Control 分布式事件触发控制的多智能体强化学习研究

Conference on Learning for Dynamics & Control Pub Date : 2023-05-15 DOI: 10.48550/arXiv.2305.08723

Lukas Kesper, Sebastian Trimpe, Dominik Baumann

引用次数: 0

Equilibria of Fully Decentralized Learning in Networked Systems 网络系统中完全分散学习的均衡

Conference on Learning for Dynamics & Control Pub Date : 2023-05-15 DOI: 10.48550/arXiv.2305.09002

Yan Jiang, Wenqi Cui, Baosen Zhang, Jorge Cort'es

引用次数: 1

Template-Based Piecewise Affine Regression 基于模板的分段仿射回归

Conference on Learning for Dynamics & Control Pub Date : 2023-05-15 DOI: 10.48550/arXiv.2305.08686

Guillaume O. Berger, S. Sankaranarayanan

引用次数: 0

A Generalizable Physics-informed Learning Framework for Risk Probability Estimation 风险概率估计的可推广物理知识学习框架

Conference on Learning for Dynamics & Control Pub Date : 2023-05-10 DOI: 10.48550/arXiv.2305.06432

Zhuoyuan Wang, Yorie Nakahira

{"title":"A Generalizable Physics-informed Learning Framework for Risk Probability Estimation","authors":"Zhuoyuan Wang, Yorie Nakahira","doi":"10.48550/arXiv.2305.06432","DOIUrl":"https://doi.org/10.48550/arXiv.2305.06432","url":null,"abstract":"Accurate estimates of long-term risk probabilities and their gradients are critical for many stochastic safe control methods. However, computing such risk probabilities in real-time and in unseen or changing environments is challenging. Monte Carlo (MC) methods cannot accurately evaluate the probabilities and their gradients as an infinitesimal devisor can amplify the sampling noise. In this paper, we develop an efficient method to evaluate the probabilities of long-term risk and their gradients. The proposed method exploits the fact that long-term risk probability satisfies certain partial differential equations (PDEs), which characterize the neighboring relations between the probabilities, to integrate MC methods and physics-informed neural networks. We provide theoretical guarantees of the estimation error given certain choices of training configurations. Numerical results show the proposed method has better sample efficiency, generalizes well to unseen regions, and can adapt to systems with changing parameters. The proposed method can also accurately estimate the gradients of risk probabilities, which enables first- and second-order techniques on risk probabilities to be used for learning and control.","PeriodicalId":268449,"journal":{"name":"Conference on Learning for Dynamics & Control","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128929574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

The Impact of the Geometric Properties of the Constraint Set in Safe Optimization with Bandit Feedback 约束集几何性质对强盗反馈安全优化的影响

Conference on Learning for Dynamics & Control Pub Date : 2023-05-01 DOI: 10.48550/arXiv.2305.00889

Spencer Hutchinson, Berkay Turan, M. Alizadeh

引用次数: 2