Stochastic Trajectory Optimization for Robotic Skill Acquisition From a Suboptimal Demonstration

IF 4.6 2区计算机科学 Q2 ROBOTICS

IEEE Robotics and Automation Letters Pub Date : 2025-04-24 DOI:10.1109/LRA.2025.3564208

Chenlin Ming;Zitong Wang;Boxuan Zhang;Zhanxiang Cao;Xiaoming Duan;Jianping He

{"title":"Stochastic Trajectory Optimization for Robotic Skill Acquisition From a Suboptimal Demonstration","authors":"Chenlin Ming;Zitong Wang;Boxuan Zhang;Zhanxiang Cao;Xiaoming Duan;Jianping He","doi":"10.1109/LRA.2025.3564208","DOIUrl":null,"url":null,"abstract":"Learning from Demonstration (LfD) has emerged as a crucial method for robots to acquire new skills. However, when given suboptimal task trajectory demonstrations with shape characteristics reflecting human preferences but subpar dynamic attributes such as slow motion, robots not only need to mimic the behaviors but also optimize the dynamic performance. In this work, we leverage optimization-based methods to search for a superior-performing trajectory whose shape is similar to that of the demonstrated trajectory. Specifically, we use Dynamic Time Warping (DTW) to quantify the difference between two trajectories and combine it with additional performance metrics, such as collision cost, to construct the cost function. Moreover, we develop a multi-policy version of the Stochastic Trajectory Optimization for Motion Planning (STOMP), called MSTOMP, which is more stable and robust to parameter changes. To deal with the jitter in the demonstrated trajectory, we further utilize the gain-controlling method in the frequency domain to denoise the demonstration and propose a computationally more efficient metric, called Mean Square Error in the Spectrum (MSES), that measures the trajectories' differences in the frequency domain. We also theoretically highlight the connections between the time domain and the frequency domain methods. Finally, we verify our method in both simulation experiments and real-world experiments, showcasing its improved optimization performance and stability compared to existing methods.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 6","pages":"6127-6134"},"PeriodicalIF":4.6000,"publicationDate":"2025-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Robotics and Automation Letters","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10976394/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ROBOTICS","Score":null,"Total":0}

引用次数: 0

Abstract

Learning from Demonstration (LfD) has emerged as a crucial method for robots to acquire new skills. However, when given suboptimal task trajectory demonstrations with shape characteristics reflecting human preferences but subpar dynamic attributes such as slow motion, robots not only need to mimic the behaviors but also optimize the dynamic performance. In this work, we leverage optimization-based methods to search for a superior-performing trajectory whose shape is similar to that of the demonstrated trajectory. Specifically, we use Dynamic Time Warping (DTW) to quantify the difference between two trajectories and combine it with additional performance metrics, such as collision cost, to construct the cost function. Moreover, we develop a multi-policy version of the Stochastic Trajectory Optimization for Motion Planning (STOMP), called MSTOMP, which is more stable and robust to parameter changes. To deal with the jitter in the demonstrated trajectory, we further utilize the gain-controlling method in the frequency domain to denoise the demonstration and propose a computationally more efficient metric, called Mean Square Error in the Spectrum (MSES), that measures the trajectories' differences in the frequency domain. We also theoretically highlight the connections between the time domain and the frequency domain methods. Finally, we verify our method in both simulation experiments and real-world experiments, showcasing its improved optimization performance and stability compared to existing methods.

查看原文本刊更多论文

基于次优演示的机器人技能获取随机轨迹优化

从演示中学习（LfD）已经成为机器人获得新技能的重要方法。然而，当给定形状特征反映人类偏好但动态属性如慢动作的次优任务轨迹演示时，机器人不仅需要模仿行为，而且需要优化动态性能。在这项工作中，我们利用基于优化的方法来搜索形状与演示轨迹相似的高性能轨迹。具体来说，我们使用动态时间扭曲（DTW）来量化两个轨迹之间的差异，并将其与其他性能指标（如碰撞成本）结合起来，以构建成本函数。此外，我们开发了一种多策略版本的随机轨迹优化运动规划（STOMP），称为MSTOMP，它对参数变化更加稳定和鲁棒。为了处理演示轨迹中的抖动，我们进一步利用频域增益控制方法对演示进行降噪，并提出了一种计算效率更高的度量，称为频谱均方误差（MSES），用于测量轨迹在频域的差异。我们还从理论上强调了时域和频域方法之间的联系。最后，我们在仿真实验和实际实验中验证了我们的方法，与现有方法相比，它具有更好的优化性能和稳定性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Robotics and Automation Letters Computer Science-Computer Science Applications

CiteScore

9.60

自引率

15.40%

发文量

1428

期刊介绍： The scope of this journal is to publish peer-reviewed articles that provide a timely and concise account of innovative research ideas and application results, reporting significant theoretical findings and application case studies in areas of robotics and automation.