Temporal Pyramid Alignment and Adaptive Fusion of Event Stream and Image Frame for Keypoint Detection and Tracking in Autonomous Driving

IF 6.2 2区 工程技术 Q1 ENGINEERING, MULTIDISCIPLINARY
Peijun Shi, Chee-Onn Chow, Wei Ru Wong
{"title":"Temporal Pyramid Alignment and Adaptive Fusion of Event Stream and Image Frame for Keypoint Detection and Tracking in Autonomous Driving","authors":"Peijun Shi,&nbsp;Chee-Onn Chow,&nbsp;Wei Ru Wong","doi":"10.1016/j.aej.2025.04.098","DOIUrl":null,"url":null,"abstract":"<div><div>This paper proposes a method to address the alignment and fusion challenges in multimodal fusion between event and RGB cameras. For multimodal alignment, we adopt the Temporal Pyramid Alignment mechanism to achieve multi-scale temporal synchronization of event streams and RGB frames. For multimodal fusion, we design a module that employs adaptive fusion to dynamically adjust the contribution of each modality based on scene complexity and feature quality. A gating network computes fusion weights by considering both relative modality importance and noise characteristics. A Cross-Modal Feature Compensation module is integrated into the framework to enhance information utilization. Additionally, the framework incorporates a Dynamic Inference Path Selection mechanism, guided by input complexity, to optimize computational resource allocation, along with a dynamic noise suppression mechanism to improve the robustness of feature extraction. Experimental results on the DSEC dataset demonstrate that the proposed method achieves a 36.9% mAP and 40.1% tracking success rate, particularly effective in extreme lighting and fast motion scenarios, surpassing existing approaches by 1.8% mAP and 1.6% SR, while maintaining real-time efficiency at 13.1 FPS. This work provides an important solution for applications in autonomous driving, robotics, and augmented reality, where robust multimodal perception under dynamic conditions is critical.</div></div>","PeriodicalId":7484,"journal":{"name":"alexandria engineering journal","volume":"127 ","pages":"Pages 228-238"},"PeriodicalIF":6.2000,"publicationDate":"2025-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"alexandria engineering journal","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1110016825005940","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

This paper proposes a method to address the alignment and fusion challenges in multimodal fusion between event and RGB cameras. For multimodal alignment, we adopt the Temporal Pyramid Alignment mechanism to achieve multi-scale temporal synchronization of event streams and RGB frames. For multimodal fusion, we design a module that employs adaptive fusion to dynamically adjust the contribution of each modality based on scene complexity and feature quality. A gating network computes fusion weights by considering both relative modality importance and noise characteristics. A Cross-Modal Feature Compensation module is integrated into the framework to enhance information utilization. Additionally, the framework incorporates a Dynamic Inference Path Selection mechanism, guided by input complexity, to optimize computational resource allocation, along with a dynamic noise suppression mechanism to improve the robustness of feature extraction. Experimental results on the DSEC dataset demonstrate that the proposed method achieves a 36.9% mAP and 40.1% tracking success rate, particularly effective in extreme lighting and fast motion scenarios, surpassing existing approaches by 1.8% mAP and 1.6% SR, while maintaining real-time efficiency at 13.1 FPS. This work provides an important solution for applications in autonomous driving, robotics, and augmented reality, where robust multimodal perception under dynamic conditions is critical.
自动驾驶关键点检测与跟踪的时间金字塔对齐与事件流与图像帧自适应融合
本文提出了一种解决事件相机与RGB相机多模态融合中对准和融合问题的方法。对于多模态对齐,我们采用时间金字塔对齐机制实现事件流和RGB帧的多尺度时间同步。对于多模态融合,我们设计了一个基于场景复杂性和特征质量的自适应融合模块来动态调整各模态的贡献。门控网络通过考虑相对模态重要性和噪声特性计算融合权值。在框架中集成了跨模态特征补偿模块,提高了信息利用率。此外,该框架还结合了一种以输入复杂度为导向的动态推理路径选择机制来优化计算资源分配,以及一种动态噪声抑制机制来提高特征提取的鲁棒性。在DSEC数据集上的实验结果表明,该方法实现了36.9%的mAP和40.1%的跟踪成功率,在极端光照和快速运动场景下尤为有效,比现有方法提高了1.8%的mAP和1.6%的SR,同时保持了13.1 FPS的实时效率。这项工作为自动驾驶、机器人和增强现实的应用提供了重要的解决方案,在这些应用中,动态条件下的鲁棒多模态感知至关重要。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
alexandria engineering journal
alexandria engineering journal Engineering-General Engineering
CiteScore
11.20
自引率
4.40%
发文量
1015
审稿时长
43 days
期刊介绍: Alexandria Engineering Journal is an international journal devoted to publishing high quality papers in the field of engineering and applied science. Alexandria Engineering Journal is cited in the Engineering Information Services (EIS) and the Chemical Abstracts (CA). The papers published in Alexandria Engineering Journal are grouped into five sections, according to the following classification: • Mechanical, Production, Marine and Textile Engineering • Electrical Engineering, Computer Science and Nuclear Engineering • Civil and Architecture Engineering • Chemical Engineering and Applied Sciences • Environmental Engineering
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信