Toward Effective 3D Object Detection via Multimodal Fusion to Automatic Driving for Industrial Cyber-Physical Systems

Honghao Gao;Yan Sun;Junsheng Xiao;Danqing Fang;Yueshen Xu;Wei Wei
{"title":"Toward Effective 3D Object Detection via Multimodal Fusion to Automatic Driving for Industrial Cyber-Physical Systems","authors":"Honghao Gao;Yan Sun;Junsheng Xiao;Danqing Fang;Yueshen Xu;Wei Wei","doi":"10.1109/TICPS.2024.3427060","DOIUrl":null,"url":null,"abstract":"AI-empowered automatic driving has experienced rapid development in industrial cyber-physical systems (CPSs), especially in safety vehicles and driverless technologies. 3D object detection is an important task for perceiving the surrounding environment and supporting decision-making when vehicles are on the road, and is also a focus in CPSs. Light detection and ranging (LiDAR)-based detection methods usually lack semantic information, resulting in high uncertainty with incorrect outputs. Thus, handling complex road scenes is difficult. Some data fusion-based methods have been developed to solve these issues. However, the spatiotemporal data misalignment between different sensors is prone to losing information during data fusion. This paper proposes exploiting multimodal information to learn more high-level features to address these issues thus reducing the uncertainty of 3D object detection. First, the VxMLA (voxel and multilevel attention) framework is employed to improve point cloud identification and modeling during 3D object detection. Second, the MF-CAMRL (modal fusion-based channel attention and multidimensional regression loss) model is proposed with two subnetworks. Our model encompasses two strategies, i.e., a multimodal fusion and a deep learning model based on CAMRL. One focuses on the semantic complementarity and geometric proximity for decisionlevel fusion. The other focuses on weighted ensemble bounding boxes to fully utilize the highlevel decision information derived from both modalities and reduce the information loss incurred during modal fusion. Finally, sufficient experiments are performed on the KITTI dataset and presented. The results show that our method is superior to baseline methods.","PeriodicalId":100640,"journal":{"name":"IEEE Transactions on Industrial Cyber-Physical Systems","volume":"2 ","pages":"281-291"},"PeriodicalIF":0.0000,"publicationDate":"2024-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Industrial Cyber-Physical Systems","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10596943/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

AI-empowered automatic driving has experienced rapid development in industrial cyber-physical systems (CPSs), especially in safety vehicles and driverless technologies. 3D object detection is an important task for perceiving the surrounding environment and supporting decision-making when vehicles are on the road, and is also a focus in CPSs. Light detection and ranging (LiDAR)-based detection methods usually lack semantic information, resulting in high uncertainty with incorrect outputs. Thus, handling complex road scenes is difficult. Some data fusion-based methods have been developed to solve these issues. However, the spatiotemporal data misalignment between different sensors is prone to losing information during data fusion. This paper proposes exploiting multimodal information to learn more high-level features to address these issues thus reducing the uncertainty of 3D object detection. First, the VxMLA (voxel and multilevel attention) framework is employed to improve point cloud identification and modeling during 3D object detection. Second, the MF-CAMRL (modal fusion-based channel attention and multidimensional regression loss) model is proposed with two subnetworks. Our model encompasses two strategies, i.e., a multimodal fusion and a deep learning model based on CAMRL. One focuses on the semantic complementarity and geometric proximity for decisionlevel fusion. The other focuses on weighted ensemble bounding boxes to fully utilize the highlevel decision information derived from both modalities and reduce the information loss incurred during modal fusion. Finally, sufficient experiments are performed on the KITTI dataset and presented. The results show that our method is superior to baseline methods.
通过多模态融合实现有效的 3D 物体检测,为工业网络物理系统提供自动驾驶功能
人工智能驱动的自动驾驶技术在工业网络物理系统(CPS)中得到了快速发展,尤其是在安全车辆和无人驾驶技术中。三维物体检测是车辆在道路上行驶时感知周围环境并支持决策的一项重要任务,也是 CPS 的一个重点。基于光探测和测距(LiDAR)的检测方法通常缺乏语义信息,导致高不确定性和错误输出。因此,处理复杂的道路场景非常困难。为了解决这些问题,人们开发了一些基于数据融合的方法。然而,不同传感器之间的时空数据错位容易导致数据融合过程中的信息丢失。本文提出利用多模态信息来学习更多高级特征,以解决这些问题,从而降低三维物体检测的不确定性。首先,采用 VxMLA(体素和多级注意)框架来改进三维物体检测过程中的点云识别和建模。其次,提出了具有两个子网络的 MF-CAMRL(基于模态融合的通道注意力和多维回归损失)模型。我们的模型包含两种策略,即多模态融合和基于 CAMRL 的深度学习模型。一种策略侧重于决策级融合的语义互补性和几何接近性。另一个侧重于加权集合边界框,以充分利用从两种模态中获得的高级决策信息,减少模态融合过程中的信息损失。最后,我们在 KITTI 数据集上进行了充分的实验,并展示了实验结果。结果表明,我们的方法优于基线方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信