Emerging Trends in Autonomous Vehicle Perception: Multimodal Fusion for 3D Object Detection

IF 2.6 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

World Electric Vehicle Journal Pub Date : 2024-01-07 DOI:10.3390/wevj15010020

S. Y. Alaba, Ali C. Gurbuz, John E. Ball

{"title":"Emerging Trends in Autonomous Vehicle Perception: Multimodal Fusion for 3D Object Detection","authors":"S. Y. Alaba, Ali C. Gurbuz, John E. Ball","doi":"10.3390/wevj15010020","DOIUrl":null,"url":null,"abstract":"The pursuit of autonomous driving relies on developing perception systems capable of making accurate, robust, and rapid decisions to interpret the driving environment effectively. Object detection is crucial for understanding the environment at these systems’ core. While 2D object detection and classification have advanced significantly with the advent of deep learning (DL) in computer vision (CV) applications, they fall short in providing essential depth information, a key element in comprehending driving environments. Consequently, 3D object detection becomes a cornerstone for autonomous driving and robotics, offering precise estimations of object locations and enhancing environmental comprehension. The CV community’s growing interest in 3D object detection is fueled by the evolution of DL models, including Convolutional Neural Networks (CNNs) and Transformer networks. Despite these advancements, challenges such as varying object scales, limited 3D sensor data, and occlusions persist in 3D object detection. To address these challenges, researchers are exploring multimodal techniques that combine information from multiple sensors, such as cameras, radar, and LiDAR, to enhance the performance of perception systems. This survey provides an exhaustive review of multimodal fusion-based 3D object detection methods, focusing on CNN and Transformer-based models. It underscores the necessity of equipping fully autonomous vehicles with diverse sensors to ensure robust and reliable operation. The survey explores the advantages and drawbacks of cameras, LiDAR, and radar sensors. Additionally, it summarizes autonomy datasets and examines the latest advancements in multimodal fusion-based methods. The survey concludes by highlighting the ongoing challenges, open issues, and potential directions for future research.","PeriodicalId":38979,"journal":{"name":"World Electric Vehicle Journal","volume":"28 7","pages":""},"PeriodicalIF":2.6000,"publicationDate":"2024-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"World Electric Vehicle Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/wevj15010020","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

The pursuit of autonomous driving relies on developing perception systems capable of making accurate, robust, and rapid decisions to interpret the driving environment effectively. Object detection is crucial for understanding the environment at these systems’ core. While 2D object detection and classification have advanced significantly with the advent of deep learning (DL) in computer vision (CV) applications, they fall short in providing essential depth information, a key element in comprehending driving environments. Consequently, 3D object detection becomes a cornerstone for autonomous driving and robotics, offering precise estimations of object locations and enhancing environmental comprehension. The CV community’s growing interest in 3D object detection is fueled by the evolution of DL models, including Convolutional Neural Networks (CNNs) and Transformer networks. Despite these advancements, challenges such as varying object scales, limited 3D sensor data, and occlusions persist in 3D object detection. To address these challenges, researchers are exploring multimodal techniques that combine information from multiple sensors, such as cameras, radar, and LiDAR, to enhance the performance of perception systems. This survey provides an exhaustive review of multimodal fusion-based 3D object detection methods, focusing on CNN and Transformer-based models. It underscores the necessity of equipping fully autonomous vehicles with diverse sensors to ensure robust and reliable operation. The survey explores the advantages and drawbacks of cameras, LiDAR, and radar sensors. Additionally, it summarizes autonomy datasets and examines the latest advancements in multimodal fusion-based methods. The survey concludes by highlighting the ongoing challenges, open issues, and potential directions for future research.

查看原文本刊更多论文

自动驾驶汽车感知领域的新趋势：三维物体检测的多模态融合

对自动驾驶的追求有赖于开发能够做出准确、稳健和快速决策的感知系统，以有效解释驾驶环境。物体检测是这些系统了解环境的核心。随着计算机视觉（CV）应用中深度学习（DL）技术的出现，2D 物体检测和分类技术取得了长足的进步，但它们在提供必要的深度信息（理解驾驶环境的关键因素）方面仍有不足。因此，三维物体检测已成为自动驾驶和机器人技术的基石，可提供物体位置的精确估算并增强环境理解能力。随着卷积神经网络（CNN）和变压器网络等 DL 模型的发展，CV 界对 3D 物体检测的兴趣与日俱增。尽管取得了这些进步，但三维物体检测仍面临着各种挑战，如不同的物体尺度、有限的三维传感器数据和遮挡物。为了应对这些挑战，研究人员正在探索多模态技术，将摄像头、雷达和激光雷达等多个传感器的信息结合起来，以提高感知系统的性能。本调查详尽评述了基于多模态融合的三维物体检测方法，重点关注基于 CNN 和 Transformer 的模型。它强调了为全自动驾驶车辆配备多种传感器以确保其稳健可靠运行的必要性。调查探讨了摄像头、激光雷达和雷达传感器的优缺点。此外，调查还总结了自动驾驶数据集，并研究了基于多模态融合方法的最新进展。调查报告最后强调了当前面临的挑战、有待解决的问题以及未来研究的潜在方向。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊