{"title":"通过定性规划和强化学习对机器人行为进行有效的在线学习","authors":"Timothy Wiley , Claude Sammut","doi":"10.1016/j.robot.2025.105122","DOIUrl":null,"url":null,"abstract":"<div><div>Autonomous robots execute complex behaviours to perform tasks in real-world environments. Reinforcement learning can acquire such behaviours, however, often requires a large number of iterations to reach an operational behaviour. This makes it inefficient for online learning, that is, learning on board the robot as it operates. Combinations of techniques such as model-based reinforcement learning, planning, and behavioural cloning, attempt to narrow the search space of trial-and-error learning. However, they rely on a significant degree of domain knowledge. We develop a domain independent Data Efficient Planning and Learning Architecture for online skill acquisition and which is applied to locomotion tasks on a multi-tracked robot typical of those designed for urban search and rescue. We build a qualitative model of the robot’s dynamics from online behavioural traces, that trades accuracy for domain independence in elevating the skill acquisition problem into a symbolic representation. Then a forward-chaining planner finds an operational sequence of qualitative symbolic actions enabling the robot to complete a task, from which quantitative action parameters representing the robot’s actuator movements are extracted. The qualitative plan places constraints on valid parameter values. This enables online reinforcement learning to refine the parameters into satisficing (or optimal) actuator movements, making trial-and-error learning data efficient in terms of the number of trials. By applying our architecture in a “closed-loop”, the qualitative model is improved from the reinforcement learning trials, refining the final robot’s operation, along with discovering new emergent behaviours.</div></div>","PeriodicalId":49592,"journal":{"name":"Robotics and Autonomous Systems","volume":"194 ","pages":"Article 105122"},"PeriodicalIF":5.2000,"publicationDate":"2025-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Data efficient online learning of robot behaviours via qualitative planning and reinforcement learning\",\"authors\":\"Timothy Wiley , Claude Sammut\",\"doi\":\"10.1016/j.robot.2025.105122\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Autonomous robots execute complex behaviours to perform tasks in real-world environments. Reinforcement learning can acquire such behaviours, however, often requires a large number of iterations to reach an operational behaviour. This makes it inefficient for online learning, that is, learning on board the robot as it operates. Combinations of techniques such as model-based reinforcement learning, planning, and behavioural cloning, attempt to narrow the search space of trial-and-error learning. However, they rely on a significant degree of domain knowledge. We develop a domain independent Data Efficient Planning and Learning Architecture for online skill acquisition and which is applied to locomotion tasks on a multi-tracked robot typical of those designed for urban search and rescue. We build a qualitative model of the robot’s dynamics from online behavioural traces, that trades accuracy for domain independence in elevating the skill acquisition problem into a symbolic representation. Then a forward-chaining planner finds an operational sequence of qualitative symbolic actions enabling the robot to complete a task, from which quantitative action parameters representing the robot’s actuator movements are extracted. The qualitative plan places constraints on valid parameter values. This enables online reinforcement learning to refine the parameters into satisficing (or optimal) actuator movements, making trial-and-error learning data efficient in terms of the number of trials. By applying our architecture in a “closed-loop”, the qualitative model is improved from the reinforcement learning trials, refining the final robot’s operation, along with discovering new emergent behaviours.</div></div>\",\"PeriodicalId\":49592,\"journal\":{\"name\":\"Robotics and Autonomous Systems\",\"volume\":\"194 \",\"pages\":\"Article 105122\"},\"PeriodicalIF\":5.2000,\"publicationDate\":\"2025-07-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Robotics and Autonomous Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0921889025002192\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Robotics and Autonomous Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0921889025002192","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
Data efficient online learning of robot behaviours via qualitative planning and reinforcement learning
Autonomous robots execute complex behaviours to perform tasks in real-world environments. Reinforcement learning can acquire such behaviours, however, often requires a large number of iterations to reach an operational behaviour. This makes it inefficient for online learning, that is, learning on board the robot as it operates. Combinations of techniques such as model-based reinforcement learning, planning, and behavioural cloning, attempt to narrow the search space of trial-and-error learning. However, they rely on a significant degree of domain knowledge. We develop a domain independent Data Efficient Planning and Learning Architecture for online skill acquisition and which is applied to locomotion tasks on a multi-tracked robot typical of those designed for urban search and rescue. We build a qualitative model of the robot’s dynamics from online behavioural traces, that trades accuracy for domain independence in elevating the skill acquisition problem into a symbolic representation. Then a forward-chaining planner finds an operational sequence of qualitative symbolic actions enabling the robot to complete a task, from which quantitative action parameters representing the robot’s actuator movements are extracted. The qualitative plan places constraints on valid parameter values. This enables online reinforcement learning to refine the parameters into satisficing (or optimal) actuator movements, making trial-and-error learning data efficient in terms of the number of trials. By applying our architecture in a “closed-loop”, the qualitative model is improved from the reinforcement learning trials, refining the final robot’s operation, along with discovering new emergent behaviours.
期刊介绍:
Robotics and Autonomous Systems will carry articles describing fundamental developments in the field of robotics, with special emphasis on autonomous systems. An important goal of this journal is to extend the state of the art in both symbolic and sensory based robot control and learning in the context of autonomous systems.
Robotics and Autonomous Systems will carry articles on the theoretical, computational and experimental aspects of autonomous systems, or modules of such systems.