Vision-Language Model-Based Human-Guided Mobile Robot Navigation in an Unstructured Environment for Human-Centric Smart Manufacturing

IF 10.1 1区工程技术 Q1 ENGINEERING, MULTIDISCIPLINARY

Engineering Pub Date : 2025-07-15 DOI:10.1016/j.eng.2025.04.028

Tian Wang, Junming Fan, Pai Zheng, Ruqiang Yan, Lihui Wang

{"title":"Vision-Language Model-Based Human-Guided Mobile Robot Navigation in an Unstructured Environment for Human-Centric Smart Manufacturing","authors":"Tian Wang, Junming Fan, Pai Zheng, Ruqiang Yan, Lihui Wang","doi":"10.1016/j.eng.2025.04.028","DOIUrl":null,"url":null,"abstract":"In smart manufacturing, autonomous mobile robots play an indispensable role in conducting inspection and material handling operations, yet they face significant limitations regarding adaptability and resilience within unstructured environments. Vision and language navigation (VLN), a human-guided navigation paradigm, emerges as a compelling solution to these challenges. Nevertheless, VLN’s practical implementation is constrained by limited task generalization capabilities, inadequate response to diverse linguistic commands, and insufficient consideration of sensor-induced noise in environmental perception. This research addresses these limitations by introducing an innovative vision-language model (VLM)-based human-guided mobile robot navigation approach in an unstructured environment for human-centric smart manufacturing (HSM). This approach encompasses robust Three-dimensional (3D) scene reconstruction through advanced point cloud techniques, zero-shot semantic segmentation via a VLM, and natural language processing through a large language model (LLM) to interpret instructions and generate control code for navigation. The system’s efficacy is validated through extensive experiments in an unstructured manufacturing setup.","PeriodicalId":11783,"journal":{"name":"Engineering","volume":"15 1","pages":""},"PeriodicalIF":10.1000,"publicationDate":"2025-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Engineering","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1016/j.eng.2025.04.028","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

Abstract

In smart manufacturing, autonomous mobile robots play an indispensable role in conducting inspection and material handling operations, yet they face significant limitations regarding adaptability and resilience within unstructured environments. Vision and language navigation (VLN), a human-guided navigation paradigm, emerges as a compelling solution to these challenges. Nevertheless, VLN’s practical implementation is constrained by limited task generalization capabilities, inadequate response to diverse linguistic commands, and insufficient consideration of sensor-induced noise in environmental perception. This research addresses these limitations by introducing an innovative vision-language model (VLM)-based human-guided mobile robot navigation approach in an unstructured environment for human-centric smart manufacturing (HSM). This approach encompasses robust Three-dimensional (3D) scene reconstruction through advanced point cloud techniques, zero-shot semantic segmentation via a VLM, and natural language processing through a large language model (LLM) to interpret instructions and generate control code for navigation. The system’s efficacy is validated through extensive experiments in an unstructured manufacturing setup.

查看原文本刊更多论文

面向以人为中心的智能制造的非结构化环境下基于视觉语言模型的人引导移动机器人导航

在智能制造中，自主移动机器人在进行检查和物料搬运操作中发挥着不可或缺的作用，但它们在非结构化环境中的适应性和弹性方面面临着重大限制。视觉和语言导航（VLN）作为一种人类引导的导航范式，成为应对这些挑战的一个令人信服的解决方案。然而，VLN的实际实施受到任务泛化能力有限、对多种语言命令的响应不足以及在环境感知中没有充分考虑传感器引起的噪声的限制。本研究通过在以人为中心的智能制造（HSM）的非结构化环境中引入一种创新的基于视觉语言模型（VLM）的人类引导移动机器人导航方法来解决这些限制。这种方法包括通过先进的点云技术进行强大的三维（3D）场景重建，通过VLM进行零镜头语义分割，以及通过大型语言模型（LLM）进行自然语言处理，以解释指令并生成导航控制代码。该系统的有效性通过非结构化制造装置的大量实验得到验证。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Engineering Environmental Science-Environmental Engineering

自引率

1.60%

发文量

335

审稿时长

35 days

期刊介绍： Engineering, an international open-access journal initiated by the Chinese Academy of Engineering (CAE) in 2015, serves as a distinguished platform for disseminating cutting-edge advancements in engineering R&D, sharing major research outputs, and highlighting key achievements worldwide. The journal's objectives encompass reporting progress in engineering science, fostering discussions on hot topics, addressing areas of interest, challenges, and prospects in engineering development, while considering human and environmental well-being and ethics in engineering. It aims to inspire breakthroughs and innovations with profound economic and social significance, propelling them to advanced international standards and transforming them into a new productive force. Ultimately, this endeavor seeks to bring about positive changes globally, benefit humanity, and shape a new future.