通过定性规划和强化学习对机器人行为进行有效的在线学习

IF 5.2 2区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS

Robotics and Autonomous Systems Pub Date : 2025-07-17 DOI:10.1016/j.robot.2025.105122

Timothy Wiley , Claude Sammut

{"title":"通过定性规划和强化学习对机器人行为进行有效的在线学习","authors":"Timothy Wiley , Claude Sammut","doi":"10.1016/j.robot.2025.105122","DOIUrl":null,"url":null,"abstract":"<div><div>Autonomous robots execute complex behaviours to perform tasks in real-world environments. Reinforcement learning can acquire such behaviours, however, often requires a large number of iterations to reach an operational behaviour. This makes it inefficient for online learning, that is, learning on board the robot as it operates. Combinations of techniques such as model-based reinforcement learning, planning, and behavioural cloning, attempt to narrow the search space of trial-and-error learning. However, they rely on a significant degree of domain knowledge. We develop a domain independent Data Efficient Planning and Learning Architecture for online skill acquisition and which is applied to locomotion tasks on a multi-tracked robot typical of those designed for urban search and rescue. We build a qualitative model of the robot’s dynamics from online behavioural traces, that trades accuracy for domain independence in elevating the skill acquisition problem into a symbolic representation. Then a forward-chaining planner finds an operational sequence of qualitative symbolic actions enabling the robot to complete a task, from which quantitative action parameters representing the robot’s actuator movements are extracted. The qualitative plan places constraints on valid parameter values. This enables online reinforcement learning to refine the parameters into satisficing (or optimal) actuator movements, making trial-and-error learning data efficient in terms of the number of trials. By applying our architecture in a “closed-loop”, the qualitative model is improved from the reinforcement learning trials, refining the final robot’s operation, along with discovering new emergent behaviours.</div></div>","PeriodicalId":49592,"journal":{"name":"Robotics and Autonomous Systems","volume":"194 ","pages":"Article 105122"},"PeriodicalIF":5.2000,"publicationDate":"2025-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Data efficient online learning of robot behaviours via qualitative planning and reinforcement learning\",\"authors\":\"Timothy Wiley , Claude Sammut\",\"doi\":\"10.1016/j.robot.2025.105122\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Autonomous robots execute complex behaviours to perform tasks in real-world environments. Reinforcement learning can acquire such behaviours, however, often requires a large number of iterations to reach an operational behaviour. This makes it inefficient for online learning, that is, learning on board the robot as it operates. Combinations of techniques such as model-based reinforcement learning, planning, and behavioural cloning, attempt to narrow the search space of trial-and-error learning. However, they rely on a significant degree of domain knowledge. We develop a domain independent Data Efficient Planning and Learning Architecture for online skill acquisition and which is applied to locomotion tasks on a multi-tracked robot typical of those designed for urban search and rescue. We build a qualitative model of the robot’s dynamics from online behavioural traces, that trades accuracy for domain independence in elevating the skill acquisition problem into a symbolic representation. Then a forward-chaining planner finds an operational sequence of qualitative symbolic actions enabling the robot to complete a task, from which quantitative action parameters representing the robot’s actuator movements are extracted. The qualitative plan places constraints on valid parameter values. This enables online reinforcement learning to refine the parameters into satisficing (or optimal) actuator movements, making trial-and-error learning data efficient in terms of the number of trials. By applying our architecture in a “closed-loop”, the qualitative model is improved from the reinforcement learning trials, refining the final robot’s operation, along with discovering new emergent behaviours.</div></div>\",\"PeriodicalId\":49592,\"journal\":{\"name\":\"Robotics and Autonomous Systems\",\"volume\":\"194 \",\"pages\":\"Article 105122\"},\"PeriodicalIF\":5.2000,\"publicationDate\":\"2025-07-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Robotics and Autonomous Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0921889025002192\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Robotics and Autonomous Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0921889025002192","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

自主机器人在现实环境中执行复杂的行为来执行任务。强化学习可以获得这样的行为，然而，通常需要大量的迭代才能达到可操作的行为。这使得在线学习效率低下，即在机器人运行时在机器人上学习。诸如基于模型的强化学习、计划和行为克隆等技术的组合，试图缩小试错学习的搜索空间。然而，它们依赖于相当程度的领域知识。我们开发了一个独立于领域的数据高效规划和学习架构，用于在线技能获取，并将其应用于城市搜索和救援中典型的多履带机器人的运动任务。我们从在线行为痕迹中建立了机器人动力学的定性模型，该模型在将技能获取问题提升为符号表示的过程中，将准确性与领域独立性相交换。然后，前向链规划器找到能够使机器人完成任务的定性符号动作的操作序列，并从中提取代表机器人执行器运动的定量动作参数。定性计划对有效的参数值进行约束。这使得在线强化学习能够将参数细化为满足（或最优）执行器运动，使试错学习数据在试验次数方面有效。通过在“闭环”中应用我们的架构，定性模型从强化学习试验中得到改进，改进了最终机器人的操作，同时发现了新的突发行为。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Data efficient online learning of robot behaviours via qualitative planning and reinforcement learning

Autonomous robots execute complex behaviours to perform tasks in real-world environments. Reinforcement learning can acquire such behaviours, however, often requires a large number of iterations to reach an operational behaviour. This makes it inefficient for online learning, that is, learning on board the robot as it operates. Combinations of techniques such as model-based reinforcement learning, planning, and behavioural cloning, attempt to narrow the search space of trial-and-error learning. However, they rely on a significant degree of domain knowledge. We develop a domain independent Data Efficient Planning and Learning Architecture for online skill acquisition and which is applied to locomotion tasks on a multi-tracked robot typical of those designed for urban search and rescue. We build a qualitative model of the robot’s dynamics from online behavioural traces, that trades accuracy for domain independence in elevating the skill acquisition problem into a symbolic representation. Then a forward-chaining planner finds an operational sequence of qualitative symbolic actions enabling the robot to complete a task, from which quantitative action parameters representing the robot’s actuator movements are extracted. The qualitative plan places constraints on valid parameter values. This enables online reinforcement learning to refine the parameters into satisficing (or optimal) actuator movements, making trial-and-error learning data efficient in terms of the number of trials. By applying our architecture in a “closed-loop”, the qualitative model is improved from the reinforcement learning trials, refining the final robot’s operation, along with discovering new emergent behaviours.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Robotics and Autonomous Systems 工程技术-机器人学

CiteScore

9.00

自引率

7.00%

发文量

164

审稿时长

4.5 months

期刊介绍： Robotics and Autonomous Systems will carry articles describing fundamental developments in the field of robotics, with special emphasis on autonomous systems. An important goal of this journal is to extend the state of the art in both symbolic and sensory based robot control and learning in the context of autonomous systems. Robotics and Autonomous Systems will carry articles on the theoretical, computational and experimental aspects of autonomous systems, or modules of such systems.