SASENet: multimodal 3D object detection for Gm-APD LiDAR based on semantic and spatial enhancement

IF 3.4 3区物理与天体物理 Q2 INSTRUMENTS & INSTRUMENTATION

Infrared Physics & Technology Pub Date : 2025-09-08 DOI:10.1016/j.infrared.2025.106145

Yuanxue Ding , Dongyang Liu , Yanchen Qu , Dakuan Du , Guanlin Chen , Xuefeng Dong , Jianfeng Sun

{"title":"SASENet: multimodal 3D object detection for Gm-APD LiDAR based on semantic and spatial enhancement","authors":"Yuanxue Ding , Dongyang Liu , Yanchen Qu , Dakuan Du , Guanlin Chen , Xuefeng Dong , Jianfeng Sun","doi":"10.1016/j.infrared.2025.106145","DOIUrl":null,"url":null,"abstract":"<div><div>Three-dimensional (3D) object detection in point clouds, a critical component of intelligent perception, has attracted considerable research attention. However, the sparsity and lack of semantic information in point clouds generated by long-range Geiger-mode avalanche photodiode (Gm-APD) LiDAR pose significant challenges, as unimodal detection struggles to distinguish structurally similar objects. To address this limitation, we propose SASENet, a multimodal 3D object detection network that integrates semantic and spatial enhancements. Specifically, at the input stage, we introduce a Semantic Spatial Enhancement Module (SSEM). Horizontally, we align the interpolated Gm-APD LiDAR range image with the infrared image and generate semantically enhanced point clouds through semantic segmentation of the infrared image. Vertically, we upsample the sparse point clouds to obtain semantic-spatially enhanced point clouds, enriching their structural information. At the feature interaction stage, we propose a Bidirectional Feature Interaction Module (BFIM) based on a dual-stream architecture, which enhances cross-modal semantic correlations by enabling bidirectional interactions between infrared image features and LiDAR point cloud features. Extensive experiments demonstrate that SASENet achieves competitive performance on our self-constructed dataset, particularly excelling in long-range 3D object detection.</div></div>","PeriodicalId":13549,"journal":{"name":"Infrared Physics & Technology","volume":"151 ","pages":"Article 106145"},"PeriodicalIF":3.4000,"publicationDate":"2025-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Infrared Physics & Technology","FirstCategoryId":"101","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1350449525004384","RegionNum":3,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"INSTRUMENTS & INSTRUMENTATION","Score":null,"Total":0}

引用次数: 0

Abstract

Three-dimensional (3D) object detection in point clouds, a critical component of intelligent perception, has attracted considerable research attention. However, the sparsity and lack of semantic information in point clouds generated by long-range Geiger-mode avalanche photodiode (Gm-APD) LiDAR pose significant challenges, as unimodal detection struggles to distinguish structurally similar objects. To address this limitation, we propose SASENet, a multimodal 3D object detection network that integrates semantic and spatial enhancements. Specifically, at the input stage, we introduce a Semantic Spatial Enhancement Module (SSEM). Horizontally, we align the interpolated Gm-APD LiDAR range image with the infrared image and generate semantically enhanced point clouds through semantic segmentation of the infrared image. Vertically, we upsample the sparse point clouds to obtain semantic-spatially enhanced point clouds, enriching their structural information. At the feature interaction stage, we propose a Bidirectional Feature Interaction Module (BFIM) based on a dual-stream architecture, which enhances cross-modal semantic correlations by enabling bidirectional interactions between infrared image features and LiDAR point cloud features. Extensive experiments demonstrate that SASENet achieves competitive performance on our self-constructed dataset, particularly excelling in long-range 3D object detection.

查看原文本刊更多论文

基于语义和空间增强的Gm-APD激光雷达多模态三维目标检测

点云中的三维目标检测是智能感知的一个重要组成部分，已经引起了人们的广泛关注。然而，远程盖格模式雪崩光电二极管（Gm-APD）激光雷达产生的点云的稀疏性和语义信息的缺乏带来了重大挑战，因为单峰探测难以区分结构相似的物体。为了解决这一限制，我们提出了SASENet，这是一个集成了语义和空间增强的多模态3D物体检测网络。具体来说，在输入阶段，我们引入了语义空间增强模块（SSEM）。在水平方向上，将插值后的Gm-APD激光雷达距离图像与红外图像对齐，对红外图像进行语义分割，生成语义增强的点云。纵向上，对稀疏点云进行上采样，得到语义空间增强的点云，丰富了点云的结构信息。在特征交互阶段，我们提出了一种基于双流架构的双向特征交互模块（Bidirectional feature interaction Module， BFIM），通过实现红外图像特征与激光雷达点云特征之间的双向交互，增强了跨模态语义相关性。大量的实验表明，SASENet在我们自己构建的数据集上取得了具有竞争力的性能，特别是在远程3D目标检测方面表现出色。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Infrared Physics & Technology 物理-光学

CiteScore

5.70

自引率

12.10%

发文量

400

审稿时长

67 days

期刊介绍： The Journal covers the entire field of infrared physics and technology: theory, experiment, application, devices and instrumentation. Infrared'' is defined as covering the near, mid and far infrared (terahertz) regions from 0.75um (750nm) to 1mm (300GHz.) Submissions in the 300GHz to 100GHz region may be accepted at the editors discretion if their content is relevant to shorter wavelengths. Submissions must be primarily concerned with and directly relevant to this spectral region. Its core topics can be summarized as the generation, propagation and detection, of infrared radiation; the associated optics, materials and devices; and its use in all fields of science, industry, engineering and medicine. Infrared techniques occur in many different fields, notably spectroscopy and interferometry; material characterization and processing; atmospheric physics, astronomy and space research. Scientific aspects include lasers, quantum optics, quantum electronics, image processing and semiconductor physics. Some important applications are medical diagnostics and treatment, industrial inspection and environmental monitoring.