Learning vision-based robotic manipulation tasks sequentially in offline reinforcement learning settings

IF 2.7 4区计算机科学 Q3 ROBOTICS

Robotica Pub Date : 2024-05-02 DOI:10.1017/s0263574724000389

Sudhir Pratap Yadav, Rajendra Nagar, Suril V. Shah

{"title":"Learning vision-based robotic manipulation tasks sequentially in offline reinforcement learning settings","authors":"Sudhir Pratap Yadav, Rajendra Nagar, Suril V. Shah","doi":"10.1017/s0263574724000389","DOIUrl":null,"url":null,"abstract":"With the rise of deep reinforcement learning (RL) methods, many complex robotic manipulation tasks are being solved. However, harnessing the full power of deep learning requires large datasets. Online RL does not suit itself readily into this paradigm due to costly and time-consuming agent-environment interaction. Therefore, many offline RL algorithms have recently been proposed to learn robotic tasks. But mainly, all such methods focus on a single-task or multitask learning, which requires retraining whenever we need to learn a new task. Continuously learning tasks without forgetting previous knowledge combined with the power of offline deep RL would allow us to scale the number of tasks by adding them one after another. This paper investigates the effectiveness of regularisation-based methods like synaptic intelligence for sequentially learning image-based robotic manipulation tasks in an offline-RL setup. We evaluate the performance of this combined framework against common challenges of sequential learning: catastrophic forgetting and forward knowledge transfer. We performed experiments with different task combinations to analyse the effect of task ordering. We also investigated the effect of the number of object configurations and the density of robot trajectories. We found that learning tasks sequentially helps in the retention of knowledge from previous tasks, thereby reducing the time required to learn a new task. Regularisation-based approaches for continuous learning, like the synaptic intelligence method, help mitigate catastrophic forgetting but have shown only limited transfer of knowledge from previous tasks.","PeriodicalId":49593,"journal":{"name":"Robotica","volume":"29 1","pages":""},"PeriodicalIF":2.7000,"publicationDate":"2024-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Robotica","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1017/s0263574724000389","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ROBOTICS","Score":null,"Total":0}

引用次数: 0

Abstract

With the rise of deep reinforcement learning (RL) methods, many complex robotic manipulation tasks are being solved. However, harnessing the full power of deep learning requires large datasets. Online RL does not suit itself readily into this paradigm due to costly and time-consuming agent-environment interaction. Therefore, many offline RL algorithms have recently been proposed to learn robotic tasks. But mainly, all such methods focus on a single-task or multitask learning, which requires retraining whenever we need to learn a new task. Continuously learning tasks without forgetting previous knowledge combined with the power of offline deep RL would allow us to scale the number of tasks by adding them one after another. This paper investigates the effectiveness of regularisation-based methods like synaptic intelligence for sequentially learning image-based robotic manipulation tasks in an offline-RL setup. We evaluate the performance of this combined framework against common challenges of sequential learning: catastrophic forgetting and forward knowledge transfer. We performed experiments with different task combinations to analyse the effect of task ordering. We also investigated the effect of the number of object configurations and the density of robot trajectories. We found that learning tasks sequentially helps in the retention of knowledge from previous tasks, thereby reducing the time required to learn a new task. Regularisation-based approaches for continuous learning, like the synaptic intelligence method, help mitigate catastrophic forgetting but have shown only limited transfer of knowledge from previous tasks.

查看原文本刊更多论文

在离线强化学习设置中按顺序学习基于视觉的机器人操纵任务

随着深度强化学习（RL）方法的兴起，许多复杂的机器人操纵任务正在得到解决。然而，要充分发挥深度学习的威力，需要大量的数据集。由于代理与环境之间的交互成本高、耗时长，在线强化学习并不适合这种模式。因此，最近提出了许多离线 RL 算法来学习机器人任务。但主要而言，所有这些方法都侧重于单任务或多任务学习，每当我们需要学习新任务时，都需要重新训练。在不遗忘先前知识的情况下持续学习任务，再加上离线深度 RL 的强大功能，我们就可以通过一个接一个地添加任务来扩展任务数量。本文研究了基于正则化的方法（如突触智能）在离线 RL 设置中连续学习基于图像的机器人操作任务的有效性。我们针对顺序学习中常见的挑战：灾难性遗忘和前向知识转移，对这一组合框架的性能进行了评估。我们进行了不同任务组合的实验，以分析任务排序的影响。我们还研究了物体配置数量和机器人轨迹密度的影响。我们发现，按顺序学习任务有助于保留之前任务的知识，从而减少学习新任务所需的时间。基于正则化的持续学习方法（如突触智能法）有助于减轻灾难性遗忘，但对先前任务知识的迁移却十分有限。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Robotica 工程技术-机器人学

CiteScore

4.50

自引率

22.20%

发文量

181

审稿时长

9.9 months

期刊介绍： Robotica is a forum for the multidisciplinary subject of robotics and encourages developments, applications and research in this important field of automation and robotics with regard to industry, health, education and economic and social aspects of relevance. Coverage includes activities in hostile environments, applications in the service and manufacturing industries, biological robotics, dynamics and kinematics involved in robot design and uses, on-line robots, robot task planning, rehabilitation robotics, sensory perception, software in the widest sense, particularly in respect of programming languages and links with CAD/CAM systems, telerobotics and various other areas. In addition, interest is focused on various Artificial Intelligence topics of theoretical and practical interest.