Deformable Cross-Attention Transformer for Weakly Aligned RGB–T Pedestrian Detection

IF 9.7 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS
Yu Hu;Xiaobo Chen;Sheng Wang;Luyang Liu;Hengyang Shi;Lihong Fan;Jing Tian;Jun Liang
{"title":"Deformable Cross-Attention Transformer for Weakly Aligned RGB–T Pedestrian Detection","authors":"Yu Hu;Xiaobo Chen;Sheng Wang;Luyang Liu;Hengyang Shi;Lihong Fan;Jing Tian;Jun Liang","doi":"10.1109/TMM.2025.3543056","DOIUrl":null,"url":null,"abstract":"Pedestrian detection plays a crucial role in autonomous driving systems. To ensure reliable and effective detection in challenging conditions, researchers have proposed RGB–T (RGB–thermal) detectors that integrate thermal images with color images for more complementary feature representations. However, existing methods face challenges in capturing the spatial and geometric correlations between different modalities, as well as in assuming perfect synchronization of the two modalities, which is unrealistic in real-world scenarios. In response to these challenges, we present a new deformable-attention-based approach for weakly aligned RGB–T pedestrian detection. The proposed method uses a dual-branch cross-attention mechanism to capture the inherent spatial and geometric correlations between color and thermal images. Furthermore, it incorporates positional information for each image pixel into the sampling offset generation to enhance robustness in scenarios where modalities are not precisely aligned or registered. To reduce computational complexity, we introduce a local attention mechanism that samples only a small set of keys and values within a limited region in the feature maps for each query. Extensive experiments and ablation studies conducted on multiple public datasets confirm the effectiveness of the proposed framework.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"4400-4411"},"PeriodicalIF":9.7000,"publicationDate":"2025-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Multimedia","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10891492/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Pedestrian detection plays a crucial role in autonomous driving systems. To ensure reliable and effective detection in challenging conditions, researchers have proposed RGB–T (RGB–thermal) detectors that integrate thermal images with color images for more complementary feature representations. However, existing methods face challenges in capturing the spatial and geometric correlations between different modalities, as well as in assuming perfect synchronization of the two modalities, which is unrealistic in real-world scenarios. In response to these challenges, we present a new deformable-attention-based approach for weakly aligned RGB–T pedestrian detection. The proposed method uses a dual-branch cross-attention mechanism to capture the inherent spatial and geometric correlations between color and thermal images. Furthermore, it incorporates positional information for each image pixel into the sampling offset generation to enhance robustness in scenarios where modalities are not precisely aligned or registered. To reduce computational complexity, we introduce a local attention mechanism that samples only a small set of keys and values within a limited region in the feature maps for each query. Extensive experiments and ablation studies conducted on multiple public datasets confirm the effectiveness of the proposed framework.
用于弱对齐RGB-T行人检测的可变形交叉注意变压器
行人检测在自动驾驶系统中起着至关重要的作用。为了确保在具有挑战性的条件下可靠有效地进行检测,研究人员提出了RGB-T (rgb -热)探测器,该探测器将热图像与彩色图像集成在一起,以获得更多互补的特征表示。然而,现有的方法在捕捉不同模态之间的空间和几何相关性以及假设两种模态的完美同步方面面临挑战,这在现实场景中是不现实的。为了应对这些挑战,我们提出了一种新的基于变形注意力的弱对齐RGB-T行人检测方法。该方法利用双分支交叉注意机制捕获彩色图像和热图像之间固有的空间和几何相关性。此外,它将每个图像像素的位置信息整合到采样偏移生成中,以增强在模态没有精确对齐或注册的情况下的鲁棒性。为了降低计算复杂度,我们引入了一种局部关注机制,在每个查询的特征映射中只对有限区域内的一小组键和值进行采样。在多个公共数据集上进行的大量实验和消融研究证实了所提出框架的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
IEEE Transactions on Multimedia
IEEE Transactions on Multimedia 工程技术-电信学
CiteScore
11.70
自引率
11.00%
发文量
576
审稿时长
5.5 months
期刊介绍: The IEEE Transactions on Multimedia delves into diverse aspects of multimedia technology and applications, covering circuits, networking, signal processing, systems, software, and systems integration. The scope aligns with the Fields of Interest of the sponsors, ensuring a comprehensive exploration of research in multimedia.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信