{"title":"在线决策元变形器:基于随意变形器的通用嵌入式智能强化学习框架","authors":"Luo Ji, Runji Lin","doi":"arxiv-2409.07341","DOIUrl":null,"url":null,"abstract":"Interactive artificial intelligence in the motion control field is an\ninteresting topic, especially when universal knowledge is adaptive to multiple\ntasks and universal environments. Despite there being increasing efforts in the\nfield of Reinforcement Learning (RL) with the aid of transformers, most of them\nmight be limited by the offline training pipeline, which prohibits exploration\nand generalization abilities. To address this limitation, we propose the\nframework of Online Decision MetaMorphFormer (ODM) which aims to achieve\nself-awareness, environment recognition, and action planning through a unified\nmodel architecture. Motivated by cognitive and behavioral psychology, an ODM\nagent is able to learn from others, recognize the world, and practice itself\nbased on its own experience. ODM can also be applied to any arbitrary agent\nwith a multi-joint body, located in different environments, and trained with\ndifferent types of tasks using large-scale pre-trained datasets. Through the\nuse of pre-trained datasets, ODM can quickly warm up and learn the necessary\nknowledge to perform the desired task, while the target environment continues\nto reinforce the universal policy. Extensive online experiments as well as\nfew-shot and zero-shot environmental tests are used to verify ODM's performance\nand generalization ability. The results of our study contribute to the study of\ngeneral artificial intelligence in embodied and cognitive fields. Code,\nresults, and video examples can be found on the website\n\\url{https://rlodm.github.io/odm/}.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":"97 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Online Decision MetaMorphFormer: A Casual Transformer-Based Reinforcement Learning Framework of Universal Embodied Intelligence\",\"authors\":\"Luo Ji, Runji Lin\",\"doi\":\"arxiv-2409.07341\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Interactive artificial intelligence in the motion control field is an\\ninteresting topic, especially when universal knowledge is adaptive to multiple\\ntasks and universal environments. Despite there being increasing efforts in the\\nfield of Reinforcement Learning (RL) with the aid of transformers, most of them\\nmight be limited by the offline training pipeline, which prohibits exploration\\nand generalization abilities. To address this limitation, we propose the\\nframework of Online Decision MetaMorphFormer (ODM) which aims to achieve\\nself-awareness, environment recognition, and action planning through a unified\\nmodel architecture. Motivated by cognitive and behavioral psychology, an ODM\\nagent is able to learn from others, recognize the world, and practice itself\\nbased on its own experience. ODM can also be applied to any arbitrary agent\\nwith a multi-joint body, located in different environments, and trained with\\ndifferent types of tasks using large-scale pre-trained datasets. Through the\\nuse of pre-trained datasets, ODM can quickly warm up and learn the necessary\\nknowledge to perform the desired task, while the target environment continues\\nto reinforce the universal policy. Extensive online experiments as well as\\nfew-shot and zero-shot environmental tests are used to verify ODM's performance\\nand generalization ability. The results of our study contribute to the study of\\ngeneral artificial intelligence in embodied and cognitive fields. Code,\\nresults, and video examples can be found on the website\\n\\\\url{https://rlodm.github.io/odm/}.\",\"PeriodicalId\":501031,\"journal\":{\"name\":\"arXiv - CS - Robotics\",\"volume\":\"97 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Robotics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.07341\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Robotics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.07341","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Online Decision MetaMorphFormer: A Casual Transformer-Based Reinforcement Learning Framework of Universal Embodied Intelligence
Interactive artificial intelligence in the motion control field is an
interesting topic, especially when universal knowledge is adaptive to multiple
tasks and universal environments. Despite there being increasing efforts in the
field of Reinforcement Learning (RL) with the aid of transformers, most of them
might be limited by the offline training pipeline, which prohibits exploration
and generalization abilities. To address this limitation, we propose the
framework of Online Decision MetaMorphFormer (ODM) which aims to achieve
self-awareness, environment recognition, and action planning through a unified
model architecture. Motivated by cognitive and behavioral psychology, an ODM
agent is able to learn from others, recognize the world, and practice itself
based on its own experience. ODM can also be applied to any arbitrary agent
with a multi-joint body, located in different environments, and trained with
different types of tasks using large-scale pre-trained datasets. Through the
use of pre-trained datasets, ODM can quickly warm up and learn the necessary
knowledge to perform the desired task, while the target environment continues
to reinforce the universal policy. Extensive online experiments as well as
few-shot and zero-shot environmental tests are used to verify ODM's performance
and generalization ability. The results of our study contribute to the study of
general artificial intelligence in embodied and cognitive fields. Code,
results, and video examples can be found on the website
\url{https://rlodm.github.io/odm/}.