Deformable Cross-Attention Transformer for Weakly Aligned RGB–T Pedestrian Detection

IF 9.7 1区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Multimedia Pub Date : 2025-02-18 DOI:10.1109/TMM.2025.3543056

Yu Hu;Xiaobo Chen;Sheng Wang;Luyang Liu;Hengyang Shi;Lihong Fan;Jing Tian;Jun Liang

{"title":"Deformable Cross-Attention Transformer for Weakly Aligned RGB–T Pedestrian Detection","authors":"Yu Hu;Xiaobo Chen;Sheng Wang;Luyang Liu;Hengyang Shi;Lihong Fan;Jing Tian;Jun Liang","doi":"10.1109/TMM.2025.3543056","DOIUrl":null,"url":null,"abstract":"Pedestrian detection plays a crucial role in autonomous driving systems. To ensure reliable and effective detection in challenging conditions, researchers have proposed RGB–T (RGB–thermal) detectors that integrate thermal images with color images for more complementary feature representations. However, existing methods face challenges in capturing the spatial and geometric correlations between different modalities, as well as in assuming perfect synchronization of the two modalities, which is unrealistic in real-world scenarios. In response to these challenges, we present a new deformable-attention-based approach for weakly aligned RGB–T pedestrian detection. The proposed method uses a dual-branch cross-attention mechanism to capture the inherent spatial and geometric correlations between color and thermal images. Furthermore, it incorporates positional information for each image pixel into the sampling offset generation to enhance robustness in scenarios where modalities are not precisely aligned or registered. To reduce computational complexity, we introduce a local attention mechanism that samples only a small set of keys and values within a limited region in the feature maps for each query. Extensive experiments and ablation studies conducted on multiple public datasets confirm the effectiveness of the proposed framework.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"4400-4411"},"PeriodicalIF":9.7000,"publicationDate":"2025-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Multimedia","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10891492/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Pedestrian detection plays a crucial role in autonomous driving systems. To ensure reliable and effective detection in challenging conditions, researchers have proposed RGB–T (RGB–thermal) detectors that integrate thermal images with color images for more complementary feature representations. However, existing methods face challenges in capturing the spatial and geometric correlations between different modalities, as well as in assuming perfect synchronization of the two modalities, which is unrealistic in real-world scenarios. In response to these challenges, we present a new deformable-attention-based approach for weakly aligned RGB–T pedestrian detection. The proposed method uses a dual-branch cross-attention mechanism to capture the inherent spatial and geometric correlations between color and thermal images. Furthermore, it incorporates positional information for each image pixel into the sampling offset generation to enhance robustness in scenarios where modalities are not precisely aligned or registered. To reduce computational complexity, we introduce a local attention mechanism that samples only a small set of keys and values within a limited region in the feature maps for each query. Extensive experiments and ablation studies conducted on multiple public datasets confirm the effectiveness of the proposed framework.

查看原文本刊更多论文

用于弱对齐RGB-T行人检测的可变形交叉注意变压器

行人检测在自动驾驶系统中起着至关重要的作用。为了确保在具有挑战性的条件下可靠有效地进行检测，研究人员提出了RGB-T （rgb -热）探测器，该探测器将热图像与彩色图像集成在一起，以获得更多互补的特征表示。然而，现有的方法在捕捉不同模态之间的空间和几何相关性以及假设两种模态的完美同步方面面临挑战，这在现实场景中是不现实的。为了应对这些挑战，我们提出了一种新的基于变形注意力的弱对齐RGB-T行人检测方法。该方法利用双分支交叉注意机制捕获彩色图像和热图像之间固有的空间和几何相关性。此外，它将每个图像像素的位置信息整合到采样偏移生成中，以增强在模态没有精确对齐或注册的情况下的鲁棒性。为了降低计算复杂度，我们引入了一种局部关注机制，在每个查询的特征映射中只对有限区域内的一小组键和值进行采样。在多个公共数据集上进行的大量实验和消融研究证实了所提出框架的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Multimedia 工程技术-电信学

CiteScore

11.70

自引率

11.00%

发文量

576

审稿时长

5.5 months

期刊介绍： The IEEE Transactions on Multimedia delves into diverse aspects of multimedia technology and applications, covering circuits, networking, signal processing, systems, software, and systems integration. The scope aligns with the Fields of Interest of the sponsors, ensuring a comprehensive exploration of research in multimedia.