Curriculum-guided skill learning for long-horizon robot manipulation tasks

IF 4.3 2区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS

Robotics and Autonomous Systems Pub Date : 2025-04-30 DOI:10.1016/j.robot.2025.105032

João Bernardo Alves, Nuno Lau, Filipe Silva

{"title":"Curriculum-guided skill learning for long-horizon robot manipulation tasks","authors":"João Bernardo Alves, Nuno Lau, Filipe Silva","doi":"10.1016/j.robot.2025.105032","DOIUrl":null,"url":null,"abstract":"<div><div>Robotic tasks often involve solving long-horizon problems. Seen under the reinforcement learning framework, the rewards provided in these problems are often sparse, which can be problematic for the learning process. In this context, dividing the long-horizon task into smaller ones represents a viable strategy to alleviate the credit assignment problem. Another approach generally used to help with this problem is curriculum learning. This paper combines both with a new skill-chaining learning algorithm that provides transition policies to bridge the gap between skills. Our approach begins by extracting meaningful skills from the states of an expert trajectory, using a heuristic method, which are subsequently used by the skill learning and the skill chaining algorithms. By leveraging the sequential order of the skills inside the demonstration, we propose a method to learn inter-skill transition policies to ensure the skills are appropriately chained. Our curriculum-based training approach enables an agent to learn action sequences that generalize inside a specific sub-task context. Using the information of a single demonstration, we show that our approach can solve a robotic manipulation task with similar performance to methods that rely on a large amount of data. Because our skill segmentation method detects which skills are present across demonstrations, we also show that our approach can reuse skills already learned in a zero-shot way.</div></div>","PeriodicalId":49592,"journal":{"name":"Robotics and Autonomous Systems","volume":"192 ","pages":"Article 105032"},"PeriodicalIF":4.3000,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Robotics and Autonomous Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0921889025001186","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Robotic tasks often involve solving long-horizon problems. Seen under the reinforcement learning framework, the rewards provided in these problems are often sparse, which can be problematic for the learning process. In this context, dividing the long-horizon task into smaller ones represents a viable strategy to alleviate the credit assignment problem. Another approach generally used to help with this problem is curriculum learning. This paper combines both with a new skill-chaining learning algorithm that provides transition policies to bridge the gap between skills. Our approach begins by extracting meaningful skills from the states of an expert trajectory, using a heuristic method, which are subsequently used by the skill learning and the skill chaining algorithms. By leveraging the sequential order of the skills inside the demonstration, we propose a method to learn inter-skill transition policies to ensure the skills are appropriately chained. Our curriculum-based training approach enables an agent to learn action sequences that generalize inside a specific sub-task context. Using the information of a single demonstration, we show that our approach can solve a robotic manipulation task with similar performance to methods that rely on a large amount of data. Because our skill segmentation method detects which skills are present across demonstrations, we also show that our approach can reuse skills already learned in a zero-shot way.

查看原文本刊更多论文

长视距机器人操作任务的课程引导技能学习

机器人的任务通常涉及解决长期问题。在强化学习框架下，在这些问题中提供的奖励通常是稀疏的，这可能会给学习过程带来问题。在这种情况下，将长期任务划分为较小的任务是缓解信用分配问题的可行策略。通常用来解决这个问题的另一种方法是课程学习。本文结合了一种新的技能链学习算法，该算法提供了过渡策略来弥合技能之间的差距。我们的方法首先使用启发式方法从专家轨迹的状态中提取有意义的技能，随后将其用于技能学习和技能链算法。通过利用演示中技能的顺序顺序，我们提出了一种方法来学习技能间转换策略，以确保技能被适当地链接起来。我们基于课程的训练方法使智能体能够学习在特定子任务上下文中概括的动作序列。利用单个演示的信息，我们表明我们的方法可以解决机器人操作任务，其性能与依赖大量数据的方法相似。因为我们的技能分割方法检测演示中存在哪些技能，我们还表明我们的方法可以以零射击的方式重用已经学习的技能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Robotics and Autonomous Systems 工程技术-机器人学

CiteScore

9.00

自引率

7.00%

发文量

164

审稿时长

4.5 months

期刊介绍： Robotics and Autonomous Systems will carry articles describing fundamental developments in the field of robotics, with special emphasis on autonomous systems. An important goal of this journal is to extend the state of the art in both symbolic and sensory based robot control and learning in the context of autonomous systems. Robotics and Autonomous Systems will carry articles on the theoretical, computational and experimental aspects of autonomous systems, or modules of such systems.