{"title":"BEVEFNet: A Multiple Object Tracking Model Based on LiDAR-Camera Fusion","authors":"Yi Yuan , Ying Liu","doi":"10.1016/j.procs.2024.08.106","DOIUrl":null,"url":null,"abstract":"<div><p>As a crucial task in the field of computer vision, object tracking models are widely used in various application domains, such as autonomous driving. However, existing multiple object tracking methods still face challenges in accurately and efficiently tracking moving multi-targets in real time. This paper presents BEVEFNet, a camera-LiDAR multi-target tracking model based on multistage fusion, which effectively utilizes the semantic information from optical images and the spatial and geometric information from LiDAR data to unify multi-modal features in a shared Bird’s Eye View(BEV) representation space. By leveraging LiDAR data to complement optical images, multi-level fusion is achieved at both the feature and decision levels. The proposed efficient sparse 3D feature extraction network significantly enhances the speed of multiple object tracking by incorporating sparse convolution. Experiments conducted on the nuSences dataset demonstrate that BEVEFNet achieves an AMOTA of 69.7, improving the accuracy of multiple object tracking.</p></div>","PeriodicalId":20465,"journal":{"name":"Procedia Computer Science","volume":"242 ","pages":"Pages 560-567"},"PeriodicalIF":0.0000,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1877050924018258/pdf?md5=7a04b4b2bac5a8561eb27e503466319f&pid=1-s2.0-S1877050924018258-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Procedia Computer Science","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1877050924018258","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
As a crucial task in the field of computer vision, object tracking models are widely used in various application domains, such as autonomous driving. However, existing multiple object tracking methods still face challenges in accurately and efficiently tracking moving multi-targets in real time. This paper presents BEVEFNet, a camera-LiDAR multi-target tracking model based on multistage fusion, which effectively utilizes the semantic information from optical images and the spatial and geometric information from LiDAR data to unify multi-modal features in a shared Bird’s Eye View(BEV) representation space. By leveraging LiDAR data to complement optical images, multi-level fusion is achieved at both the feature and decision levels. The proposed efficient sparse 3D feature extraction network significantly enhances the speed of multiple object tracking by incorporating sparse convolution. Experiments conducted on the nuSences dataset demonstrate that BEVEFNet achieves an AMOTA of 69.7, improving the accuracy of multiple object tracking.