DeployFusion：面向边缘设备的可部署单目三维物体检测与多传感器信息融合BEV。

IF 3.4 3区综合性期刊 Q2 CHEMISTRY, ANALYTICAL

Sensors Pub Date : 2024-10-31 DOI:10.3390/s24217007

Fei Huang, Shengshu Liu, Guangqian Zhang, Bingsen Hao, Yangkai Xiang, Kun Yuan

{"title":"DeployFusion：面向边缘设备的可部署单目三维物体检测与多传感器信息融合BEV。","authors":"Fei Huang, Shengshu Liu, Guangqian Zhang, Bingsen Hao, Yangkai Xiang, Kun Yuan","doi":"10.3390/s24217007","DOIUrl":null,"url":null,"abstract":"To address the challenges of suboptimal remote detection and significant computational burden in existing multi-sensor information fusion 3D object detection methods, a novel approach based on Bird's-Eye View (BEV) is proposed. This method utilizes an enhanced lightweight EdgeNeXt feature extraction network, incorporating residual branches to address network degradation caused by the excessive depth of STDA encoding blocks. Meantime, deformable convolution is used to expand the receptive field and reduce computational complexity. The feature fusion module constructs a two-stage fusion network to optimize the fusion and alignment of multi-sensor features. This network aligns image features to supplement environmental information with point cloud features, thereby obtaining the final BEV features. Additionally, a Transformer decoder that emphasizes global spatial cues is employed to process the BEV feature sequence, enabling precise detection of distant small objects. Experimental results demonstrate that this method surpasses the baseline network, with improvements of 4.5% in the NuScenes detection score and 5.5% in average precision for detection objects. Finally, the model is converted and accelerated using TensorRT tools for deployment on mobile devices, achieving an inference time of 138 ms per frame on the Jetson Orin NX embedded platform, thus enabling real-time 3D object detection.","PeriodicalId":21698,"journal":{"name":"Sensors","volume":"24 21","pages":""},"PeriodicalIF":3.4000,"publicationDate":"2024-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11548664/pdf/","citationCount":"0","resultStr":"{\"title\":\"DeployFusion: A Deployable Monocular 3D Object Detection with Multi-Sensor Information Fusion in BEV for Edge Devices.\",\"authors\":\"Fei Huang, Shengshu Liu, Guangqian Zhang, Bingsen Hao, Yangkai Xiang, Kun Yuan\",\"doi\":\"10.3390/s24217007\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"To address the challenges of suboptimal remote detection and significant computational burden in existing multi-sensor information fusion 3D object detection methods, a novel approach based on Bird's-Eye View (BEV) is proposed. This method utilizes an enhanced lightweight EdgeNeXt feature extraction network, incorporating residual branches to address network degradation caused by the excessive depth of STDA encoding blocks. Meantime, deformable convolution is used to expand the receptive field and reduce computational complexity. The feature fusion module constructs a two-stage fusion network to optimize the fusion and alignment of multi-sensor features. This network aligns image features to supplement environmental information with point cloud features, thereby obtaining the final BEV features. Additionally, a Transformer decoder that emphasizes global spatial cues is employed to process the BEV feature sequence, enabling precise detection of distant small objects. Experimental results demonstrate that this method surpasses the baseline network, with improvements of 4.5% in the NuScenes detection score and 5.5% in average precision for detection objects. Finally, the model is converted and accelerated using TensorRT tools for deployment on mobile devices, achieving an inference time of 138 ms per frame on the Jetson Orin NX embedded platform, thus enabling real-time 3D object detection.\",\"PeriodicalId\":21698,\"journal\":{\"name\":\"Sensors\",\"volume\":\"24 21\",\"pages\":\"\"},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2024-10-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11548664/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Sensors\",\"FirstCategoryId\":\"103\",\"ListUrlMain\":\"https://doi.org/10.3390/s24217007\",\"RegionNum\":3,\"RegionCategory\":\"综合性期刊\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"CHEMISTRY, ANALYTICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Sensors","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.3390/s24217007","RegionNum":3,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, ANALYTICAL","Score":null,"Total":0}

引用次数: 0

摘要

为解决现有多传感器信息融合三维物体检测方法中存在的远程检测效果不理想和计算负担沉重的难题，提出了一种基于鸟瞰图（BEV）的新方法。该方法利用增强型轻量级 EdgeNeXt 特征提取网络，结合残余分支来解决 STDA 编码块过深造成的网络退化问题。同时，利用可变形卷积来扩展感受野，降低计算复杂度。特征融合模块构建了一个两阶段融合网络，以优化多传感器特征的融合和对齐。该网络将图像特征与点云特征进行对齐，以补充环境信息，从而获得最终的 BEV 特征。此外，还采用了强调全局空间线索的变换器解码器来处理 BEV 特征序列，从而实现对远处小型物体的精确检测。实验结果表明，该方法超越了基线网络，NuScenes 检测得分提高了 4.5%，检测物体的平均精度提高了 5.5%。最后，利用 TensorRT 工具对模型进行了转换和加速，以便在移动设备上部署，在 Jetson Orin NX 嵌入式平台上实现了每帧 138 毫秒的推理时间，从而实现了实时三维物体检测。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

DeployFusion: A Deployable Monocular 3D Object Detection with Multi-Sensor Information Fusion in BEV for Edge Devices.

To address the challenges of suboptimal remote detection and significant computational burden in existing multi-sensor information fusion 3D object detection methods, a novel approach based on Bird's-Eye View (BEV) is proposed. This method utilizes an enhanced lightweight EdgeNeXt feature extraction network, incorporating residual branches to address network degradation caused by the excessive depth of STDA encoding blocks. Meantime, deformable convolution is used to expand the receptive field and reduce computational complexity. The feature fusion module constructs a two-stage fusion network to optimize the fusion and alignment of multi-sensor features. This network aligns image features to supplement environmental information with point cloud features, thereby obtaining the final BEV features. Additionally, a Transformer decoder that emphasizes global spatial cues is employed to process the BEV feature sequence, enabling precise detection of distant small objects. Experimental results demonstrate that this method surpasses the baseline network, with improvements of 4.5% in the NuScenes detection score and 5.5% in average precision for detection objects. Finally, the model is converted and accelerated using TensorRT tools for deployment on mobile devices, achieving an inference time of 138 ms per frame on the Jetson Orin NX embedded platform, thus enabling real-time 3D object detection.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Sensors 工程技术-电化学

CiteScore

7.30

自引率

12.80%

发文量

8430

审稿时长

1.7 months

期刊介绍： Sensors (ISSN 1424-8220) provides an advanced forum for the science and technology of sensors and biosensors. It publishes reviews (including comprehensive reviews on the complete sensors products), regular research papers and short notes. Our aim is to encourage scientists to publish their experimental and theoretical results in as much detail as possible. There is no restriction on the length of the papers. The full experimental details must be provided so that the results can be reproduced.