Leveraging randomized smoothing for optimal control of nonsmooth dynamical systems

IF 3.7 2区 计算机科学 Q2 AUTOMATION & CONTROL SYSTEMS
Quentin Le Lidec , Fabian Schramm , Louis Montaut , Cordelia Schmid , Ivan Laptev , Justin Carpentier
{"title":"Leveraging randomized smoothing for optimal control of nonsmooth dynamical systems","authors":"Quentin Le Lidec ,&nbsp;Fabian Schramm ,&nbsp;Louis Montaut ,&nbsp;Cordelia Schmid ,&nbsp;Ivan Laptev ,&nbsp;Justin Carpentier","doi":"10.1016/j.nahs.2024.101468","DOIUrl":null,"url":null,"abstract":"<div><p><span><span>Optimal control<span> (OC) algorithms such as differential dynamic programming (DDP) take advantage of the derivatives of the dynamics to control physical systems efficiently. Yet, these algorithms are prone to failure when dealing with non-smooth dynamical systems. This can be attributed to factors such as the existence of discontinuities in the dynamics derivatives or the presence of non-informative gradients. On the contrary, reinforcement learning (RL) algorithms have shown better empirical results in scenarios exhibiting non-smooth effects (contacts, frictions, etc.). Our approach leverages recent works on randomized smoothing (RS) to tackle non-smoothness issues commonly encountered in optimal control and provides key insights on the interplay between RL and OC through the prism of RS methods. This naturally leads us to introduce the randomized Differential Dynamic Programming (RDDP) algorithm accounting for deterministic but non-smooth dynamics in a very sample-efficient way. The experiments demonstrate that our method can solve classic robotic problems with </span></span>dry friction and </span>frictional contacts, where classical OC algorithms are likely to fail, and RL algorithms require, in practice, a prohibitive number of samples to find an optimal solution.</p></div>","PeriodicalId":49011,"journal":{"name":"Nonlinear Analysis-Hybrid Systems","volume":"52 ","pages":"Article 101468"},"PeriodicalIF":3.7000,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nonlinear Analysis-Hybrid Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1751570X24000050","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Optimal control (OC) algorithms such as differential dynamic programming (DDP) take advantage of the derivatives of the dynamics to control physical systems efficiently. Yet, these algorithms are prone to failure when dealing with non-smooth dynamical systems. This can be attributed to factors such as the existence of discontinuities in the dynamics derivatives or the presence of non-informative gradients. On the contrary, reinforcement learning (RL) algorithms have shown better empirical results in scenarios exhibiting non-smooth effects (contacts, frictions, etc.). Our approach leverages recent works on randomized smoothing (RS) to tackle non-smoothness issues commonly encountered in optimal control and provides key insights on the interplay between RL and OC through the prism of RS methods. This naturally leads us to introduce the randomized Differential Dynamic Programming (RDDP) algorithm accounting for deterministic but non-smooth dynamics in a very sample-efficient way. The experiments demonstrate that our method can solve classic robotic problems with dry friction and frictional contacts, where classical OC algorithms are likely to fail, and RL algorithms require, in practice, a prohibitive number of samples to find an optimal solution.

利用随机平滑技术实现非平滑动力系统的优化控制
最优控制 (OC) 算法(如微分动态编程 (DDP))利用动态的导数来有效控制物理系统。然而,这些算法在处理非光滑动态系统时容易失效。这可能是由于动力学导数存在不连续性或存在非信息梯度等因素造成的。相反,强化学习(RL)算法在表现出非光滑效应(接触、摩擦等)的场景中显示出更好的经验结果。我们的方法利用随机平滑(RS)方面的最新研究成果来解决最优控制中常见的非平滑性问题,并通过 RS 方法的棱镜为 RL 和 OC 之间的相互作用提供了重要见解。由此,我们自然而然地引入了随机差分动态编程(RDDP)算法,该算法以非常高效的抽样方式计算确定性但非平滑的动态。实验证明,我们的方法可以解决具有干摩擦和摩擦接触的经典机器人问题,在这些问题中,经典的 OC 算法很可能会失败,而 RL 算法实际上需要过多的样本才能找到最优解。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Nonlinear Analysis-Hybrid Systems
Nonlinear Analysis-Hybrid Systems AUTOMATION & CONTROL SYSTEMS-MATHEMATICS, APPLIED
CiteScore
8.30
自引率
9.50%
发文量
65
审稿时长
>12 weeks
期刊介绍: Nonlinear Analysis: Hybrid Systems welcomes all important research and expository papers in any discipline. Papers that are principally concerned with the theory of hybrid systems should contain significant results indicating relevant applications. Papers that emphasize applications should consist of important real world models and illuminating techniques. Papers that interrelate various aspects of hybrid systems will be most welcome.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信