{"title":"Co-Fix3D: Enhancing 3D Object Detection With Collaborative Refinement","authors":"Wenxuan Li;Qin Zou;Chi Chen;Bo Du;Long Chen;Jian Zhou;Hongkai Yu","doi":"10.1109/LRA.2025.3555859","DOIUrl":null,"url":null,"abstract":"3D object detection in driving scenarios is particularly challenging due to factors such as sensor noise, occlusions, and the inherent sparsity of LiDAR point clouds, which can lead to the loss or incompleteness of key features, in turn affecting perception performance. To address these challenges, we propose Co-Fix3D, an advanced detection framework that integrates Local and Global Enhancement (LGE) modules to refine Bird's Eye View (BEV) features. The LGE module employs Discrete Wavelet Transform (DWT) to refine local features at a fine scale, which helps capture frequency details and subtle variations in the environment, and incorporates an attention mechanism to enhance global feature representations across the entire scene. Moreover, we adopt multi-head LGE modules that each concentrate on targets with varying levels of detection difficulty, further improving our overall perception performance. On the nuScenes dataset, Co-Fix3D achieves a new SOTA performance with 69.4% mAP and 73.5% NDS compared to other competing methods, while on the multimodal benchmark, it achieves 72.3% mAP and 74.7% NDS, respectively.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 5","pages":"4970-4977"},"PeriodicalIF":4.6000,"publicationDate":"2025-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Robotics and Automation Letters","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10945409/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ROBOTICS","Score":null,"Total":0}
引用次数: 0
Abstract
3D object detection in driving scenarios is particularly challenging due to factors such as sensor noise, occlusions, and the inherent sparsity of LiDAR point clouds, which can lead to the loss or incompleteness of key features, in turn affecting perception performance. To address these challenges, we propose Co-Fix3D, an advanced detection framework that integrates Local and Global Enhancement (LGE) modules to refine Bird's Eye View (BEV) features. The LGE module employs Discrete Wavelet Transform (DWT) to refine local features at a fine scale, which helps capture frequency details and subtle variations in the environment, and incorporates an attention mechanism to enhance global feature representations across the entire scene. Moreover, we adopt multi-head LGE modules that each concentrate on targets with varying levels of detection difficulty, further improving our overall perception performance. On the nuScenes dataset, Co-Fix3D achieves a new SOTA performance with 69.4% mAP and 73.5% NDS compared to other competing methods, while on the multimodal benchmark, it achieves 72.3% mAP and 74.7% NDS, respectively.
期刊介绍:
The scope of this journal is to publish peer-reviewed articles that provide a timely and concise account of innovative research ideas and application results, reporting significant theoretical findings and application case studies in areas of robotics and automation.