Multimodal Perception and Decision-Making Systems for Complex Roads Based on Foundation Models

IF 8.6 1区 计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS
Lili Fan;Yutong Wang;Hui Zhang;Changxian Zeng;Yunjie Li;Chao Gou;Hui Yu
{"title":"Multimodal Perception and Decision-Making Systems for Complex Roads Based on Foundation Models","authors":"Lili Fan;Yutong Wang;Hui Zhang;Changxian Zeng;Yunjie Li;Chao Gou;Hui Yu","doi":"10.1109/TSMC.2024.3444277","DOIUrl":null,"url":null,"abstract":"Since the inception of Industry 5.0 in 2021, a growing number of researchers have begun to pay their attention to the revolutionary shift it brings. The principles of Industry 5.0, including human-centric, sustainability, and emphasis on ecological and social values, will become the new paradigm for future industrial development. In this transformative landscape, artificial intelligence (AI) plays a pivotal role, and foundation models based on ChatGPT are set to reshape the organizational structure of industries. In this article, we introduce a multimodal perception and decision-making system built upon a foundational model. This system integrates image and point cloud data to enhance perception accuracy and provide ample information for decision making. It is designed to achieve a deep integration of AI and human-centric autonomous driving within the context of Industry 5.0. We introduce a cross-domain learning approach in the system architecture, along with a model training method from foundation models to handle complex road conditions. The proposed method enables road drivable area segmentation on complex unstructured roads. To address the issue of increased variance caused by the residual structure employed in previous works, this article introduces a distribution correction module, which effectively mitigates this problem. Furthermore, to achieve high-performance perception systems in intricate road scenarios, we put forth a multimodal perception fusion method in this study. The experiments demonstrate the superiority of this approach over single-sensor perception. This work contributes to the ongoing discourse on the convergence of AI, human-centric values, and advanced driving systems within the framework of Industry 5.0.","PeriodicalId":48915,"journal":{"name":"IEEE Transactions on Systems Man Cybernetics-Systems","volume":"54 11","pages":"6561-6569"},"PeriodicalIF":8.6000,"publicationDate":"2024-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Systems Man Cybernetics-Systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10706115/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Since the inception of Industry 5.0 in 2021, a growing number of researchers have begun to pay their attention to the revolutionary shift it brings. The principles of Industry 5.0, including human-centric, sustainability, and emphasis on ecological and social values, will become the new paradigm for future industrial development. In this transformative landscape, artificial intelligence (AI) plays a pivotal role, and foundation models based on ChatGPT are set to reshape the organizational structure of industries. In this article, we introduce a multimodal perception and decision-making system built upon a foundational model. This system integrates image and point cloud data to enhance perception accuracy and provide ample information for decision making. It is designed to achieve a deep integration of AI and human-centric autonomous driving within the context of Industry 5.0. We introduce a cross-domain learning approach in the system architecture, along with a model training method from foundation models to handle complex road conditions. The proposed method enables road drivable area segmentation on complex unstructured roads. To address the issue of increased variance caused by the residual structure employed in previous works, this article introduces a distribution correction module, which effectively mitigates this problem. Furthermore, to achieve high-performance perception systems in intricate road scenarios, we put forth a multimodal perception fusion method in this study. The experiments demonstrate the superiority of this approach over single-sensor perception. This work contributes to the ongoing discourse on the convergence of AI, human-centric values, and advanced driving systems within the framework of Industry 5.0.
基于地基模型的复杂道路多模式感知与决策系统
自 2021 年工业 5.0 诞生以来,越来越多的研究人员开始关注它所带来的革命性转变。工业 5.0 的原则,包括以人为本、可持续发展、重视生态和社会价值,将成为未来工业发展的新范式。在这一变革格局中,人工智能(AI)扮演着举足轻重的角色,基于 ChatGPT 的基础模型必将重塑工业的组织结构。在本文中,我们将介绍一个建立在基础模型上的多模态感知和决策系统。该系统整合了图像和点云数据,以提高感知精度,并为决策提供充足的信息。它旨在实现工业 5.0 背景下人工智能与以人为本的自动驾驶的深度融合。我们在系统架构中引入了一种跨领域学习方法,以及一种从基础模型出发的模型训练方法,以处理复杂路况。所提出的方法可在复杂的非结构化道路上实现道路可驾驶区域分割。针对以往研究中采用的残差结构导致方差增大的问题,本文引入了分布校正模块,有效缓解了这一问题。此外,为了在错综复杂的道路场景中实现高性能的感知系统,我们在本研究中提出了一种多模态感知融合方法。实验证明了这种方法优于单传感器感知。在工业 5.0 的框架内,本研究为当前有关人工智能、以人为本的价值观和先进驾驶系统融合的讨论做出了贡献。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
IEEE Transactions on Systems Man Cybernetics-Systems
IEEE Transactions on Systems Man Cybernetics-Systems AUTOMATION & CONTROL SYSTEMS-COMPUTER SCIENCE, CYBERNETICS
CiteScore
18.50
自引率
11.50%
发文量
812
审稿时长
6 months
期刊介绍: The IEEE Transactions on Systems, Man, and Cybernetics: Systems encompasses the fields of systems engineering, covering issue formulation, analysis, and modeling throughout the systems engineering lifecycle phases. It addresses decision-making, issue interpretation, systems management, processes, and various methods such as optimization, modeling, and simulation in the development and deployment of large systems.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信