Online Decision MetaMorphFormer: A Casual Transformer-Based Reinforcement Learning Framework of Universal Embodied Intelligence

arXiv - CS - Robotics Pub Date : 2024-09-11 DOI:arxiv-2409.07341

Luo Ji, Runji Lin

{"title":"Online Decision MetaMorphFormer: A Casual Transformer-Based Reinforcement Learning Framework of Universal Embodied Intelligence","authors":"Luo Ji, Runji Lin","doi":"arxiv-2409.07341","DOIUrl":null,"url":null,"abstract":"Interactive artificial intelligence in the motion control field is an\ninteresting topic, especially when universal knowledge is adaptive to multiple\ntasks and universal environments. Despite there being increasing efforts in the\nfield of Reinforcement Learning (RL) with the aid of transformers, most of them\nmight be limited by the offline training pipeline, which prohibits exploration\nand generalization abilities. To address this limitation, we propose the\nframework of Online Decision MetaMorphFormer (ODM) which aims to achieve\nself-awareness, environment recognition, and action planning through a unified\nmodel architecture. Motivated by cognitive and behavioral psychology, an ODM\nagent is able to learn from others, recognize the world, and practice itself\nbased on its own experience. ODM can also be applied to any arbitrary agent\nwith a multi-joint body, located in different environments, and trained with\ndifferent types of tasks using large-scale pre-trained datasets. Through the\nuse of pre-trained datasets, ODM can quickly warm up and learn the necessary\nknowledge to perform the desired task, while the target environment continues\nto reinforce the universal policy. Extensive online experiments as well as\nfew-shot and zero-shot environmental tests are used to verify ODM's performance\nand generalization ability. The results of our study contribute to the study of\ngeneral artificial intelligence in embodied and cognitive fields. Code,\nresults, and video examples can be found on the website\n\\url{https://rlodm.github.io/odm/}.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":"97 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Robotics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.07341","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Interactive artificial intelligence in the motion control field is an interesting topic, especially when universal knowledge is adaptive to multiple tasks and universal environments. Despite there being increasing efforts in the field of Reinforcement Learning (RL) with the aid of transformers, most of them might be limited by the offline training pipeline, which prohibits exploration and generalization abilities. To address this limitation, we propose the framework of Online Decision MetaMorphFormer (ODM) which aims to achieve self-awareness, environment recognition, and action planning through a unified model architecture. Motivated by cognitive and behavioral psychology, an ODM agent is able to learn from others, recognize the world, and practice itself based on its own experience. ODM can also be applied to any arbitrary agent with a multi-joint body, located in different environments, and trained with different types of tasks using large-scale pre-trained datasets. Through the use of pre-trained datasets, ODM can quickly warm up and learn the necessary knowledge to perform the desired task, while the target environment continues to reinforce the universal policy. Extensive online experiments as well as few-shot and zero-shot environmental tests are used to verify ODM's performance and generalization ability. The results of our study contribute to the study of general artificial intelligence in embodied and cognitive fields. Code, results, and video examples can be found on the website \url{https://rlodm.github.io/odm/}.

查看原文本刊更多论文

在线决策元变形器：基于随意变形器的通用嵌入式智能强化学习框架

运动控制领域的交互式人工智能是一个有趣的话题，尤其是当通用知识能够适应多重任务和通用环境时。尽管在借助变形器进行强化学习（RL）领域的努力越来越多，但大多数变形器可能会受到离线训练管道的限制，从而阻碍了探索和泛化能力。针对这一局限，我们提出了在线决策元变形器（ODM）框架，旨在通过统一的模型架构实现自我认知、环境识别和行动规划。受认知心理学和行为心理学的启发，ODM 代理能够向他人学习、识别世界并根据自身经验进行自我练习。ODM 还可应用于任何具有多关节身体、位于不同环境中的任意代理，并使用大规模预训练数据集进行不同类型任务的训练。通过使用预训练数据集，ODM 可以快速预热并学习执行所需任务的必要知识，同时目标环境会继续强化通用策略。为了验证 ODM 的性能和泛化能力，我们进行了广泛的在线实验以及少量和零次环境测试。我们的研究成果有助于体现和认知领域的通用人工智能研究。代码、结果和视频示例可以在网站（url{https://rlodm.github.io/odm/}）上找到。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

arXiv - CS - Robotics

自引率

0.00%

发文量