Yuanxue Ding , Dongyang Liu , Yanchen Qu , Dakuan Du , Guanlin Chen , Xuefeng Dong , Jianfeng Sun
{"title":"SASENet: multimodal 3D object detection for Gm-APD LiDAR based on semantic and spatial enhancement","authors":"Yuanxue Ding , Dongyang Liu , Yanchen Qu , Dakuan Du , Guanlin Chen , Xuefeng Dong , Jianfeng Sun","doi":"10.1016/j.infrared.2025.106145","DOIUrl":null,"url":null,"abstract":"<div><div>Three-dimensional (3D) object detection in point clouds, a critical component of intelligent perception, has attracted considerable research attention. However, the sparsity and lack of semantic information in point clouds generated by long-range Geiger-mode avalanche photodiode (Gm-APD) LiDAR pose significant challenges, as unimodal detection struggles to distinguish structurally similar objects. To address this limitation, we propose SASENet, a multimodal 3D object detection network that integrates semantic and spatial enhancements. Specifically, at the input stage, we introduce a Semantic Spatial Enhancement Module (SSEM). Horizontally, we align the interpolated Gm-APD LiDAR range image with the infrared image and generate semantically enhanced point clouds through semantic segmentation of the infrared image. Vertically, we upsample the sparse point clouds to obtain semantic-spatially enhanced point clouds, enriching their structural information. At the feature interaction stage, we propose a Bidirectional Feature Interaction Module (BFIM) based on a dual-stream architecture, which enhances cross-modal semantic correlations by enabling bidirectional interactions between infrared image features and LiDAR point cloud features. Extensive experiments demonstrate that SASENet achieves competitive performance on our self-constructed dataset, particularly excelling in long-range 3D object detection.</div></div>","PeriodicalId":13549,"journal":{"name":"Infrared Physics & Technology","volume":"151 ","pages":"Article 106145"},"PeriodicalIF":3.4000,"publicationDate":"2025-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Infrared Physics & Technology","FirstCategoryId":"101","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1350449525004384","RegionNum":3,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"INSTRUMENTS & INSTRUMENTATION","Score":null,"Total":0}
引用次数: 0
Abstract
Three-dimensional (3D) object detection in point clouds, a critical component of intelligent perception, has attracted considerable research attention. However, the sparsity and lack of semantic information in point clouds generated by long-range Geiger-mode avalanche photodiode (Gm-APD) LiDAR pose significant challenges, as unimodal detection struggles to distinguish structurally similar objects. To address this limitation, we propose SASENet, a multimodal 3D object detection network that integrates semantic and spatial enhancements. Specifically, at the input stage, we introduce a Semantic Spatial Enhancement Module (SSEM). Horizontally, we align the interpolated Gm-APD LiDAR range image with the infrared image and generate semantically enhanced point clouds through semantic segmentation of the infrared image. Vertically, we upsample the sparse point clouds to obtain semantic-spatially enhanced point clouds, enriching their structural information. At the feature interaction stage, we propose a Bidirectional Feature Interaction Module (BFIM) based on a dual-stream architecture, which enhances cross-modal semantic correlations by enabling bidirectional interactions between infrared image features and LiDAR point cloud features. Extensive experiments demonstrate that SASENet achieves competitive performance on our self-constructed dataset, particularly excelling in long-range 3D object detection.
期刊介绍:
The Journal covers the entire field of infrared physics and technology: theory, experiment, application, devices and instrumentation. Infrared'' is defined as covering the near, mid and far infrared (terahertz) regions from 0.75um (750nm) to 1mm (300GHz.) Submissions in the 300GHz to 100GHz region may be accepted at the editors discretion if their content is relevant to shorter wavelengths. Submissions must be primarily concerned with and directly relevant to this spectral region.
Its core topics can be summarized as the generation, propagation and detection, of infrared radiation; the associated optics, materials and devices; and its use in all fields of science, industry, engineering and medicine.
Infrared techniques occur in many different fields, notably spectroscopy and interferometry; material characterization and processing; atmospheric physics, astronomy and space research. Scientific aspects include lasers, quantum optics, quantum electronics, image processing and semiconductor physics. Some important applications are medical diagnostics and treatment, industrial inspection and environmental monitoring.