AutoLfD：关闭从演示中学习的循环

IF 6.4 2区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS

IEEE Transactions on Automation Science and Engineering Pub Date : 2025-01-22 DOI:10.1109/TASE.2025.3532820

Shaokang Wu;Yijin Wang;Yanlong Huang

{"title":"AutoLfD：关闭从演示中学习的循环","authors":"Shaokang Wu;Yijin Wang;Yanlong Huang","doi":"10.1109/TASE.2025.3532820","DOIUrl":null,"url":null,"abstract":"Over the past few years, there have been numerous works towards advancing the generalization capability of robots, among which learning from demonstrations (LfD) has drawn much attention by virtue of its user-friendly and data-efficient nature. While many LfD solutions have been reported, a key question has not been properly addressed: how can we evaluate the generalization performance of LfD? For instance, when a robot draws a letter that needs to pass through new desired points, how does it ensure the new trajectory maintains a similar shape to the demonstration? This question becomes more relevant when a new task is significantly far from the demonstrated region. To tackle this issue, a user often resorts to manual tuning of the hyperparameters of an LfD approach until a satisfactory trajectory is attained. In this paper, we aim to provide closed-loop evaluative feedback for LfD and optimize LfD in an automatic fashion. Specifically, we consider dynamical movement primitives (DMP) and kernelized movement primitives (KMP) as examples and develop a generic optimization framework capable of measuring the generalization performance of DMP and KMP and auto-optimizing their hyperparameters. Evaluations including peg-in-hole, block-stacking and pushing tasks on a real robot evidence the applicability of our framework. Note to Practitioners—The paper is motivated by the demand to transfer human skills to robots. While the problems of ‘what to learn’ and ‘how to learn’ have been long-standing research topics, the solutions for evaluating the quality of such skill transfer remain largely open. We introduce a novel closed-loop framework towards transferring human skills to robots in an automatic manner. Specifically, we collect a training dataset that reflects user preference for trajectory adaptation and train a trajectory encoder network using the dataset. With the encoder network, we design a robust metric to measure the skill transfer quality and subsequently employ the metric to guide imitation learning of human skills. By using our framework, unseen robotic tasks can be tackled by adapting the demonstrations straightforwardly, where relevant hyperparameters involved in skill transfer are optimized automatically.","PeriodicalId":51060,"journal":{"name":"IEEE Transactions on Automation Science and Engineering","volume":"22 ","pages":"11124-11138"},"PeriodicalIF":6.4000,"publicationDate":"2025-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"AutoLfD: Closing the Loop for Learning From Demonstrations\",\"authors\":\"Shaokang Wu;Yijin Wang;Yanlong Huang\",\"doi\":\"10.1109/TASE.2025.3532820\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Over the past few years, there have been numerous works towards advancing the generalization capability of robots, among which learning from demonstrations (LfD) has drawn much attention by virtue of its user-friendly and data-efficient nature. While many LfD solutions have been reported, a key question has not been properly addressed: how can we evaluate the generalization performance of LfD? For instance, when a robot draws a letter that needs to pass through new desired points, how does it ensure the new trajectory maintains a similar shape to the demonstration? This question becomes more relevant when a new task is significantly far from the demonstrated region. To tackle this issue, a user often resorts to manual tuning of the hyperparameters of an LfD approach until a satisfactory trajectory is attained. In this paper, we aim to provide closed-loop evaluative feedback for LfD and optimize LfD in an automatic fashion. Specifically, we consider dynamical movement primitives (DMP) and kernelized movement primitives (KMP) as examples and develop a generic optimization framework capable of measuring the generalization performance of DMP and KMP and auto-optimizing their hyperparameters. Evaluations including peg-in-hole, block-stacking and pushing tasks on a real robot evidence the applicability of our framework. Note to Practitioners—The paper is motivated by the demand to transfer human skills to robots. While the problems of ‘what to learn’ and ‘how to learn’ have been long-standing research topics, the solutions for evaluating the quality of such skill transfer remain largely open. We introduce a novel closed-loop framework towards transferring human skills to robots in an automatic manner. Specifically, we collect a training dataset that reflects user preference for trajectory adaptation and train a trajectory encoder network using the dataset. With the encoder network, we design a robust metric to measure the skill transfer quality and subsequently employ the metric to guide imitation learning of human skills. By using our framework, unseen robotic tasks can be tackled by adapting the demonstrations straightforwardly, where relevant hyperparameters involved in skill transfer are optimized automatically.\",\"PeriodicalId\":51060,\"journal\":{\"name\":\"IEEE Transactions on Automation Science and Engineering\",\"volume\":\"22 \",\"pages\":\"11124-11138\"},\"PeriodicalIF\":6.4000,\"publicationDate\":\"2025-01-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Automation Science and Engineering\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10849584/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Automation Science and Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10849584/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

在过去的几年里，已经有许多关于提高机器人泛化能力的工作，其中从演示中学习（LfD）因其用户友好和数据高效的特性而备受关注。虽然已经报道了许多LfD解决方案，但一个关键问题尚未得到适当解决：我们如何评估LfD的泛化性能？例如，当机器人绘制一个需要经过新点的字母时，它如何确保新轨迹与演示保持相似的形状？当一个新任务离演示区域非常远时，这个问题变得更加相关。为了解决这个问题，用户经常求助于手动调整LfD方法的超参数，直到获得满意的轨迹。在本文中，我们的目标是为LfD提供闭环评估反馈，并以自动方式优化LfD。具体来说，我们以动态运动原语（DMP）和核化运动原语（KMP）为例，开发了一个通用的优化框架，能够测量DMP和KMP的泛化性能并自动优化它们的超参数。评估包括钉入孔，块堆叠和推任务在一个真实的机器人证明我们的框架的适用性。给从业人员的说明——本文的动机是将人类技能转移到机器人上的需求。虽然“学习什么”和“如何学习”的问题一直是长期的研究课题，但评估这种技能转移质量的解决方案在很大程度上仍然是开放的。我们引入了一种新的闭环框架，以自动方式将人类技能转移给机器人。具体来说，我们收集了一个反映用户对轨迹自适应偏好的训练数据集，并使用该数据集训练一个轨迹编码器网络。通过编码器网络，我们设计了一个鲁棒度量来衡量技能转移质量，并随后使用该度量来指导人类技能的模仿学习。通过使用我们的框架，可以通过直接调整演示来解决看不见的机器人任务，其中涉及技能转移的相关超参数被自动优化。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

AutoLfD: Closing the Loop for Learning From Demonstrations

Over the past few years, there have been numerous works towards advancing the generalization capability of robots, among which learning from demonstrations (LfD) has drawn much attention by virtue of its user-friendly and data-efficient nature. While many LfD solutions have been reported, a key question has not been properly addressed: how can we evaluate the generalization performance of LfD? For instance, when a robot draws a letter that needs to pass through new desired points, how does it ensure the new trajectory maintains a similar shape to the demonstration? This question becomes more relevant when a new task is significantly far from the demonstrated region. To tackle this issue, a user often resorts to manual tuning of the hyperparameters of an LfD approach until a satisfactory trajectory is attained. In this paper, we aim to provide closed-loop evaluative feedback for LfD and optimize LfD in an automatic fashion. Specifically, we consider dynamical movement primitives (DMP) and kernelized movement primitives (KMP) as examples and develop a generic optimization framework capable of measuring the generalization performance of DMP and KMP and auto-optimizing their hyperparameters. Evaluations including peg-in-hole, block-stacking and pushing tasks on a real robot evidence the applicability of our framework. Note to Practitioners—The paper is motivated by the demand to transfer human skills to robots. While the problems of ‘what to learn’ and ‘how to learn’ have been long-standing research topics, the solutions for evaluating the quality of such skill transfer remain largely open. We introduce a novel closed-loop framework towards transferring human skills to robots in an automatic manner. Specifically, we collect a training dataset that reflects user preference for trajectory adaptation and train a trajectory encoder network using the dataset. With the encoder network, we design a robust metric to measure the skill transfer quality and subsequently employ the metric to guide imitation learning of human skills. By using our framework, unseen robotic tasks can be tackled by adapting the demonstrations straightforwardly, where relevant hyperparameters involved in skill transfer are optimized automatically.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Automation Science and Engineering 工程技术-自动化与控制系统

CiteScore

12.50

自引率

14.30%

发文量

404

审稿时长

3.0 months

期刊介绍： The IEEE Transactions on Automation Science and Engineering (T-ASE) publishes fundamental papers on Automation, emphasizing scientific results that advance efficiency, quality, productivity, and reliability. T-ASE encourages interdisciplinary approaches from computer science, control systems, electrical engineering, mathematics, mechanical engineering, operations research, and other fields. T-ASE welcomes results relevant to industries such as agriculture, biotechnology, healthcare, home automation, maintenance, manufacturing, pharmaceuticals, retail, security, service, supply chains, and transportation. T-ASE addresses a research community willing to integrate knowledge across disciplines and industries. For this purpose, each paper includes a Note to Practitioners that summarizes how its results can be applied or how they might be extended to apply in practice.