{"title":"Cross-Modal Adaptation for Object Detection in Infrared Remote Sensing Imagery","authors":"Zeyu Wang;Shuaiting Li;Kejie Huang","doi":"10.1109/LGRS.2025.3527560","DOIUrl":null,"url":null,"abstract":"Modern infrared (IR) technology has been proven highly significant in remote sensing imagery (RSI). Currently, multimodal RSI object detection based on red-green–blue (RGB)-IR image pairs has attracted widespread research. However, capturing features in the IR domain poses a challenge, as existing object detectors heavily focus on chromatic information in the RGB domain. Furthermore, the quality of RGB images can be influenced by complex environmental conditions, limiting the practicality of multimodal detection. In this letter, we introduce cross-modal-you only look once (CM-YOLO), a lightweight yet effective object detector specifically designed for IR remote sensing images. CM-YOLO employs cross-modal adaptation to enhance the awareness of IR-RGB modality translation. Specifically, we leverage a prior modality translator (PMT) to learn the infrared-visible (IV) features, which are incorporated into the detection backbone using our IV-gate modules. Experimental results on the VEDAI dataset demonstrate that CM-YOLO significantly outperforms conventional methods. Moreover, CM-YOLO exhibits a strong generalization ability for IR-based object detection in urban scenes on the FLIR dataset.","PeriodicalId":91017,"journal":{"name":"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society","volume":"22 ","pages":"1-5"},"PeriodicalIF":4.4000,"publicationDate":"2025-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10835167/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Modern infrared (IR) technology has been proven highly significant in remote sensing imagery (RSI). Currently, multimodal RSI object detection based on red-green–blue (RGB)-IR image pairs has attracted widespread research. However, capturing features in the IR domain poses a challenge, as existing object detectors heavily focus on chromatic information in the RGB domain. Furthermore, the quality of RGB images can be influenced by complex environmental conditions, limiting the practicality of multimodal detection. In this letter, we introduce cross-modal-you only look once (CM-YOLO), a lightweight yet effective object detector specifically designed for IR remote sensing images. CM-YOLO employs cross-modal adaptation to enhance the awareness of IR-RGB modality translation. Specifically, we leverage a prior modality translator (PMT) to learn the infrared-visible (IV) features, which are incorporated into the detection backbone using our IV-gate modules. Experimental results on the VEDAI dataset demonstrate that CM-YOLO significantly outperforms conventional methods. Moreover, CM-YOLO exhibits a strong generalization ability for IR-based object detection in urban scenes on the FLIR dataset.
现代红外(IR)技术在遥感成像(RSI)中具有重要意义。目前,基于红绿蓝(RGB)-红外图像对的多模态RSI目标检测得到了广泛的研究。然而,捕捉红外域的特征是一个挑战,因为现有的目标检测器主要关注RGB域的颜色信息。此外,RGB图像的质量会受到复杂环境条件的影响,限制了多模态检测的实用性。在这封信中,我们介绍了cross-modal-you only look once (CM-YOLO),一种轻量级但有效的目标探测器,专门为红外遥感图像设计。CM-YOLO采用了跨模态自适应,增强了对IR-RGB模态翻译的认知。具体来说,我们利用先验模态翻译器(PMT)来学习红外可见(IV)特征,这些特征使用我们的IV门模块合并到检测主干中。在VEDAI数据集上的实验结果表明,CM-YOLO显著优于传统方法。此外,CM-YOLO在FLIR数据集上对城市场景中基于红外的目标检测表现出较强的泛化能力。