RaLiBEV：无锚箱目标检测系统的雷达和激光雷达BEV融合学习

IF 8.3 1区工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2024-12-23 DOI:10.1109/TCSVT.2024.3521375

Yanlong Yang;Jianan Liu;Tao Huang;Qing-Long Han;Gang Ma;Bing Zhu

{"title":"RaLiBEV：无锚箱目标检测系统的雷达和激光雷达BEV融合学习","authors":"Yanlong Yang;Jianan Liu;Tao Huang;Qing-Long Han;Gang Ma;Bing Zhu","doi":"10.1109/TCSVT.2024.3521375","DOIUrl":null,"url":null,"abstract":"In autonomous driving, LiDAR and radar are crucial for environmental perception. LiDAR offers precise 3D spatial sensing information but struggles in adverse weather like fog. Conversely, radar signals can penetrate rain or mist due to their specific wavelength but are prone to noise disturbances. Recent state-of-the-art works reveal that the fusion of radar and LiDAR can lead to robust detection in adverse weather. Current approaches typically fuse features from various data sources using basic convolutional/transformer network architectures and employ straightforward label assignment strategies for object detection. However, these methods have two main limitations: they fail to adequately capture feature interactions and lack consistent regression constraints. In this paper, we propose a bird’s-eye view fusion learning-based anchor box-free object detection system. Our approach introduces a novel interactive transformer module for enhanced feature fusion and an advanced label assignment strategy for more consistent regression, addressing key limitations in existing methods. Specifically, experiments show that, our approach’s average precision ranks <inline-formula> <tex-math>$1^{st}$ </tex-math></inline-formula> and significantly outperforms the state-of-the-art method by 13.1% and 19.0% at Intersection of Union (IoU) of 0.8 under “Clear+Foggy” training conditions for “Clear” and “Foggy” testing, respectively. Our code repository is available at: <uri>https://github.com/yyxr75/RaLiBEV</uri>.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 5","pages":"4130-4143"},"PeriodicalIF":8.3000,"publicationDate":"2024-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"RaLiBEV: Radar and LiDAR BEV Fusion Learning for Anchor Box Free Object Detection Systems\",\"authors\":\"Yanlong Yang;Jianan Liu;Tao Huang;Qing-Long Han;Gang Ma;Bing Zhu\",\"doi\":\"10.1109/TCSVT.2024.3521375\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In autonomous driving, LiDAR and radar are crucial for environmental perception. LiDAR offers precise 3D spatial sensing information but struggles in adverse weather like fog. Conversely, radar signals can penetrate rain or mist due to their specific wavelength but are prone to noise disturbances. Recent state-of-the-art works reveal that the fusion of radar and LiDAR can lead to robust detection in adverse weather. Current approaches typically fuse features from various data sources using basic convolutional/transformer network architectures and employ straightforward label assignment strategies for object detection. However, these methods have two main limitations: they fail to adequately capture feature interactions and lack consistent regression constraints. In this paper, we propose a bird’s-eye view fusion learning-based anchor box-free object detection system. Our approach introduces a novel interactive transformer module for enhanced feature fusion and an advanced label assignment strategy for more consistent regression, addressing key limitations in existing methods. Specifically, experiments show that, our approach’s average precision ranks <inline-formula> <tex-math>$1^{st}$ </tex-math></inline-formula> and significantly outperforms the state-of-the-art method by 13.1% and 19.0% at Intersection of Union (IoU) of 0.8 under “Clear+Foggy” training conditions for “Clear” and “Foggy” testing, respectively. Our code repository is available at: <uri>https://github.com/yyxr75/RaLiBEV</uri>.\",\"PeriodicalId\":13082,\"journal\":{\"name\":\"IEEE Transactions on Circuits and Systems for Video Technology\",\"volume\":\"35 5\",\"pages\":\"4130-4143\"},\"PeriodicalIF\":8.3000,\"publicationDate\":\"2024-12-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Circuits and Systems for Video Technology\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10812016/\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems for Video Technology","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10812016/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

摘要

在自动驾驶中，激光雷达和雷达对环境感知至关重要。激光雷达可以提供精确的3D空间传感信息，但在雾等恶劣天气下会遇到困难。相反，雷达信号由于其特定的波长可以穿透雨或雾，但容易受到噪声干扰。最近的最新研究表明，雷达和激光雷达的融合可以在恶劣天气下实现强大的探测。目前的方法通常使用基本的卷积/变压器网络架构融合来自各种数据源的特征，并采用直接的标签分配策略进行对象检测。然而，这些方法有两个主要的局限性：它们不能充分捕获特征交互，并且缺乏一致的回归约束。本文提出了一种基于鸟瞰图融合学习的无锚框目标检测系统。我们的方法引入了一种新的交互式变压器模块，用于增强特征融合，并引入了一种先进的标签分配策略，用于更一致的回归，解决了现有方法中的关键限制。具体来说，实验表明，我们的方法的平均精度为$1^{st}$，并且在“清晰”和“模糊”测试的“清晰”和“模糊”训练条件下，在交汇点（IoU）为0.8时，我们的方法的平均精度分别显著优于最先进的方法13.1%和19.0%。我们的代码存储库可从https://github.com/yyxr75/RaLiBEV获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

RaLiBEV: Radar and LiDAR BEV Fusion Learning for Anchor Box Free Object Detection Systems

In autonomous driving, LiDAR and radar are crucial for environmental perception. LiDAR offers precise 3D spatial sensing information but struggles in adverse weather like fog. Conversely, radar signals can penetrate rain or mist due to their specific wavelength but are prone to noise disturbances. Recent state-of-the-art works reveal that the fusion of radar and LiDAR can lead to robust detection in adverse weather. Current approaches typically fuse features from various data sources using basic convolutional/transformer network architectures and employ straightforward label assignment strategies for object detection. However, these methods have two main limitations: they fail to adequately capture feature interactions and lack consistent regression constraints. In this paper, we propose a bird’s-eye view fusion learning-based anchor box-free object detection system. Our approach introduces a novel interactive transformer module for enhanced feature fusion and an advanced label assignment strategy for more consistent regression, addressing key limitations in existing methods. Specifically, experiments show that, our approach’s average precision ranks

$1^{st}$

and significantly outperforms the state-of-the-art method by 13.1% and 19.0% at Intersection of Union (IoU) of 0.8 under “Clear+Foggy” training conditions for “Clear” and “Foggy” testing, respectively. Our code repository is available at: https://github.com/yyxr75/RaLiBEV.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Circuits and Systems for Video Technology 工程技术-工程：电子与电气

CiteScore

13.80

自引率

27.40%

发文量

660

审稿时长

5 months

期刊介绍： The IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) is dedicated to covering all aspects of video technologies from a circuits and systems perspective. We encourage submissions of general, theoretical, and application-oriented papers related to image and video acquisition, representation, presentation, and display. Additionally, we welcome contributions in areas such as processing, filtering, and transforms; analysis and synthesis; learning and understanding; compression, transmission, communication, and networking; as well as storage, retrieval, indexing, and search. Furthermore, papers focusing on hardware and software design and implementation are highly valued. Join us in advancing the field of video technology through innovative research and insights.