An Overview of Robot Embodied Intelligence Based on Multimodal Models: Tasks, Models, and System Schemes

IF 3.7 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

International Journal of Intelligent Systems Pub Date : 2025-06-14 DOI:10.1155/int/5124400

Yao Cong, Hongwei Mo

{"title":"An Overview of Robot Embodied Intelligence Based on Multimodal Models: Tasks, Models, and System Schemes","authors":"Yao Cong, Hongwei Mo","doi":"10.1155/int/5124400","DOIUrl":null,"url":null,"abstract":"<div>\n <p>The exploration of embodied intelligence has garnered widespread consensus in the field of artificial intelligence (AI), aiming to achieve artificial general intelligence (AGI). Classical AI models, which rely on labeled data for learning, struggle to adapt to dynamic, unstructured environments due to their offline learning paradigms. Conversely, embodied intelligence emphasizes interactive learning, acquiring richer information through environmental interactions for training, thereby enabling autonomous learning and action. Early embodied tasks primarily centered on navigation. With the surge in popularity of large language models (LLMs), the focus shifted to integrating LLMs/multimodal large models (MLM) with robots, empowering them to tackle more intricate tasks through reasoning and planning, leveraging the prior knowledge imparted by LLM/MLM. This work reviews initial embodied tasks and corresponding research, categorizes various current embodied intelligence schemes deployed in robotics within the context of LLM/MLM, summarizes the perception–planning–action (PPA) paradigm, evaluates the performance of MLM across different schemes, and offers insights for future development directions in this domain.</p>\n </div>","PeriodicalId":14089,"journal":{"name":"International Journal of Intelligent Systems","volume":"2025 1","pages":""},"PeriodicalIF":3.7000,"publicationDate":"2025-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1155/int/5124400","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Intelligent Systems","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1155/int/5124400","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

The exploration of embodied intelligence has garnered widespread consensus in the field of artificial intelligence (AI), aiming to achieve artificial general intelligence (AGI). Classical AI models, which rely on labeled data for learning, struggle to adapt to dynamic, unstructured environments due to their offline learning paradigms. Conversely, embodied intelligence emphasizes interactive learning, acquiring richer information through environmental interactions for training, thereby enabling autonomous learning and action. Early embodied tasks primarily centered on navigation. With the surge in popularity of large language models (LLMs), the focus shifted to integrating LLMs/multimodal large models (MLM) with robots, empowering them to tackle more intricate tasks through reasoning and planning, leveraging the prior knowledge imparted by LLM/MLM. This work reviews initial embodied tasks and corresponding research, categorizes various current embodied intelligence schemes deployed in robotics within the context of LLM/MLM, summarizes the perception–planning–action (PPA) paradigm, evaluates the performance of MLM across different schemes, and offers insights for future development directions in this domain.

Abstract Image

查看原文本刊更多论文

基于多模态模型的机器人具身智能综述：任务、模型和系统方案

对具身智能的探索已经在人工智能（AI）领域获得了广泛的共识，其目标是实现人工通用智能（AGI）。经典的人工智能模型依赖于标记数据进行学习，由于其离线学习范式，难以适应动态、非结构化的环境。相反，具身智能强调互动学习，通过环境交互获取更丰富的信息进行训练，从而实现自主学习和自主行动。早期的具体化任务主要集中在导航上。随着大型语言模型（LLM）的普及，重点转移到将LLM/多模态大型模型（MLM）与机器人集成，使它们能够通过推理和规划来处理更复杂的任务，利用LLM/MLM赋予的先验知识。本文回顾了最初的具体任务和相应的研究，对目前在LLM/MLM背景下部署在机器人中的各种具体智能方案进行了分类，总结了感知-计划-行动（PPA）范式，评估了不同方案下MLM的性能，并为该领域的未来发展方向提供了见解。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Journal of Intelligent Systems 工程技术-计算机：人工智能

CiteScore

11.30

自引率

14.30%

发文量

304

审稿时长

9 months

期刊介绍： The International Journal of Intelligent Systems serves as a forum for individuals interested in tapping into the vast theories based on intelligent systems construction. With its peer-reviewed format, the journal explores several fascinating editorials written by today''s experts in the field. Because new developments are being introduced each day, there''s much to be learned — examination, analysis creation, information retrieval, man–computer interactions, and more. The International Journal of Intelligent Systems uses charts and illustrations to demonstrate these ground-breaking issues, and encourages readers to share their thoughts and experiences.