{"title":"RaLiBEV:无锚箱目标检测系统的雷达和激光雷达BEV融合学习","authors":"Yanlong Yang;Jianan Liu;Tao Huang;Qing-Long Han;Gang Ma;Bing Zhu","doi":"10.1109/TCSVT.2024.3521375","DOIUrl":null,"url":null,"abstract":"In autonomous driving, LiDAR and radar are crucial for environmental perception. LiDAR offers precise 3D spatial sensing information but struggles in adverse weather like fog. Conversely, radar signals can penetrate rain or mist due to their specific wavelength but are prone to noise disturbances. Recent state-of-the-art works reveal that the fusion of radar and LiDAR can lead to robust detection in adverse weather. Current approaches typically fuse features from various data sources using basic convolutional/transformer network architectures and employ straightforward label assignment strategies for object detection. However, these methods have two main limitations: they fail to adequately capture feature interactions and lack consistent regression constraints. In this paper, we propose a bird’s-eye view fusion learning-based anchor box-free object detection system. Our approach introduces a novel interactive transformer module for enhanced feature fusion and an advanced label assignment strategy for more consistent regression, addressing key limitations in existing methods. Specifically, experiments show that, our approach’s average precision ranks <inline-formula> <tex-math>$1^{st}$ </tex-math></inline-formula> and significantly outperforms the state-of-the-art method by 13.1% and 19.0% at Intersection of Union (IoU) of 0.8 under “Clear+Foggy” training conditions for “Clear” and “Foggy” testing, respectively. Our code repository is available at: <uri>https://github.com/yyxr75/RaLiBEV</uri>.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 5","pages":"4130-4143"},"PeriodicalIF":8.3000,"publicationDate":"2024-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"RaLiBEV: Radar and LiDAR BEV Fusion Learning for Anchor Box Free Object Detection Systems\",\"authors\":\"Yanlong Yang;Jianan Liu;Tao Huang;Qing-Long Han;Gang Ma;Bing Zhu\",\"doi\":\"10.1109/TCSVT.2024.3521375\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In autonomous driving, LiDAR and radar are crucial for environmental perception. LiDAR offers precise 3D spatial sensing information but struggles in adverse weather like fog. Conversely, radar signals can penetrate rain or mist due to their specific wavelength but are prone to noise disturbances. Recent state-of-the-art works reveal that the fusion of radar and LiDAR can lead to robust detection in adverse weather. Current approaches typically fuse features from various data sources using basic convolutional/transformer network architectures and employ straightforward label assignment strategies for object detection. However, these methods have two main limitations: they fail to adequately capture feature interactions and lack consistent regression constraints. In this paper, we propose a bird’s-eye view fusion learning-based anchor box-free object detection system. Our approach introduces a novel interactive transformer module for enhanced feature fusion and an advanced label assignment strategy for more consistent regression, addressing key limitations in existing methods. Specifically, experiments show that, our approach’s average precision ranks <inline-formula> <tex-math>$1^{st}$ </tex-math></inline-formula> and significantly outperforms the state-of-the-art method by 13.1% and 19.0% at Intersection of Union (IoU) of 0.8 under “Clear+Foggy” training conditions for “Clear” and “Foggy” testing, respectively. Our code repository is available at: <uri>https://github.com/yyxr75/RaLiBEV</uri>.\",\"PeriodicalId\":13082,\"journal\":{\"name\":\"IEEE Transactions on Circuits and Systems for Video Technology\",\"volume\":\"35 5\",\"pages\":\"4130-4143\"},\"PeriodicalIF\":8.3000,\"publicationDate\":\"2024-12-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Circuits and Systems for Video Technology\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10812016/\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems for Video Technology","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10812016/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
RaLiBEV: Radar and LiDAR BEV Fusion Learning for Anchor Box Free Object Detection Systems
In autonomous driving, LiDAR and radar are crucial for environmental perception. LiDAR offers precise 3D spatial sensing information but struggles in adverse weather like fog. Conversely, radar signals can penetrate rain or mist due to their specific wavelength but are prone to noise disturbances. Recent state-of-the-art works reveal that the fusion of radar and LiDAR can lead to robust detection in adverse weather. Current approaches typically fuse features from various data sources using basic convolutional/transformer network architectures and employ straightforward label assignment strategies for object detection. However, these methods have two main limitations: they fail to adequately capture feature interactions and lack consistent regression constraints. In this paper, we propose a bird’s-eye view fusion learning-based anchor box-free object detection system. Our approach introduces a novel interactive transformer module for enhanced feature fusion and an advanced label assignment strategy for more consistent regression, addressing key limitations in existing methods. Specifically, experiments show that, our approach’s average precision ranks $1^{st}$ and significantly outperforms the state-of-the-art method by 13.1% and 19.0% at Intersection of Union (IoU) of 0.8 under “Clear+Foggy” training conditions for “Clear” and “Foggy” testing, respectively. Our code repository is available at: https://github.com/yyxr75/RaLiBEV.
期刊介绍:
The IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) is dedicated to covering all aspects of video technologies from a circuits and systems perspective. We encourage submissions of general, theoretical, and application-oriented papers related to image and video acquisition, representation, presentation, and display. Additionally, we welcome contributions in areas such as processing, filtering, and transforms; analysis and synthesis; learning and understanding; compression, transmission, communication, and networking; as well as storage, retrieval, indexing, and search. Furthermore, papers focusing on hardware and software design and implementation are highly valued. Join us in advancing the field of video technology through innovative research and insights.