Spatio-Temporal Pyramid Keypoint Detection With Event Cameras

IF 11.1 1区工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2025-04-09 DOI:10.1109/TCSVT.2025.3559299

Yuqing Zhu;Yuan Gao;Tianle Ding;Xiang Liu;Wenfei Yang;Tianzhu Zhang

{"title":"Spatio-Temporal Pyramid Keypoint Detection With Event Cameras","authors":"Yuqing Zhu;Yuan Gao;Tianle Ding;Xiang Liu;Wenfei Yang;Tianzhu Zhang","doi":"10.1109/TCSVT.2025.3559299","DOIUrl":null,"url":null,"abstract":"Event cameras are bio-inspired sensors with diverse advantages, including high temporal resolution and minimal power consumption. Therefore, event cameras enjoy a wide range of applications in computer vision, among which event keypoint detection plays a vital role. However, repeatable event keypoint detection remains challenging because the lack of temporal inter-frame interaction leads to descriptors with limited temporal consistency, which restricts the ability to perceive keypoint motion. Besides, detectors learned at single scale features are not suitable for event keypoints with significant motion speed differences in high-speed scenarios. To deal with these problems, we propose a novel Spatio-Temporal Pyramid Keypoint Detection Network (STPNet) for event cameras via a temporally consistent descriptor learning (TCL) module and a spatially diverse detector learning (SDL) module. The proposed STPNet enjoys several merits. First, the TCL module generates temporally consistent descriptors for specific keypoint motion patterns. Second, the SDL module produces spatially diverse detectors for applications in high-speed motion scenarios. Extensive experimental results on three challenging benchmarks show that our method notably outperforms state-of-the-art event keypoint detection methods. Specifically, our STPNet can outperform the best event keypoint detection method by 0.21px in reprj. error on Event-Camera, 4% in IoU on N-Caltech101, 0.13px in reprj. error on HVGA ATIS Corner and 5.94% in matching accuracy on DSEC.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 9","pages":"9384-9397"},"PeriodicalIF":11.1000,"publicationDate":"2025-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems for Video Technology","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10960429/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

Event cameras are bio-inspired sensors with diverse advantages, including high temporal resolution and minimal power consumption. Therefore, event cameras enjoy a wide range of applications in computer vision, among which event keypoint detection plays a vital role. However, repeatable event keypoint detection remains challenging because the lack of temporal inter-frame interaction leads to descriptors with limited temporal consistency, which restricts the ability to perceive keypoint motion. Besides, detectors learned at single scale features are not suitable for event keypoints with significant motion speed differences in high-speed scenarios. To deal with these problems, we propose a novel Spatio-Temporal Pyramid Keypoint Detection Network (STPNet) for event cameras via a temporally consistent descriptor learning (TCL) module and a spatially diverse detector learning (SDL) module. The proposed STPNet enjoys several merits. First, the TCL module generates temporally consistent descriptors for specific keypoint motion patterns. Second, the SDL module produces spatially diverse detectors for applications in high-speed motion scenarios. Extensive experimental results on three challenging benchmarks show that our method notably outperforms state-of-the-art event keypoint detection methods. Specifically, our STPNet can outperform the best event keypoint detection method by 0.21px in reprj. error on Event-Camera, 4% in IoU on N-Caltech101, 0.13px in reprj. error on HVGA ATIS Corner and 5.94% in matching accuracy on DSEC.

查看原文本刊更多论文

基于事件相机的时空金字塔关键点检测

事件相机是一种生物传感器，具有多种优势，包括高时间分辨率和最小功耗。因此，事件摄像机在计算机视觉中有着广泛的应用，其中事件关键点检测起着至关重要的作用。然而，可重复事件关键点检测仍然具有挑战性，因为缺乏时间帧间交互导致描述符具有有限的时间一致性，这限制了感知关键点运动的能力。此外，单尺度特征学习的检测器不适合高速场景中运动速度差异较大的事件关键点。为了解决这些问题，我们通过时间一致描述符学习（TCL）模块和空间多样化检测器学习（SDL）模块为事件相机提出了一种新的时空金字塔关键点检测网络（STPNet）。拟议的STPNet有几个优点。首先，TCL模块为特定的关键点运动模式生成暂时一致的描述符。其次，SDL模块为高速运动场景中的应用提供空间多样化的检测器。在三个具有挑战性的基准测试中进行的大量实验结果表明，我们的方法明显优于最先进的事件关键点检测方法。具体来说，我们的STPNet在reprj上比最好的事件关键点检测方法高出0.21像素。事件相机上的错误，N-Caltech101上的IoU为4%，reprj为0.13像素。在HVGA ATIS角上的匹配精度为5.94%，在DSEC上的匹配精度为5.94%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Circuits and Systems for Video Technology 工程技术-工程：电子与电气

CiteScore

13.80

自引率

27.40%

发文量

660

审稿时长

5 months

期刊介绍： The IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) is dedicated to covering all aspects of video technologies from a circuits and systems perspective. We encourage submissions of general, theoretical, and application-oriented papers related to image and video acquisition, representation, presentation, and display. Additionally, we welcome contributions in areas such as processing, filtering, and transforms; analysis and synthesis; learning and understanding; compression, transmission, communication, and networking; as well as storage, retrieval, indexing, and search. Furthermore, papers focusing on hardware and software design and implementation are highly valued. Join us in advancing the field of video technology through innovative research and insights.