Event Tubelet Compressor: Generating Compact Representations for Event-Based Action Recognition

2022 7th International Conference on Control, Robotics and Cybernetics (CRC) Pub Date : 2022-12-15 DOI:10.1109/CRC55853.2022.10041200

Bochen Xie, Yongjian Deng, Z. Shao, Hai Liu, Qingsong Xu, Youfu Li

{"title":"Event Tubelet Compressor: Generating Compact Representations for Event-Based Action Recognition","authors":"Bochen Xie, Yongjian Deng, Z. Shao, Hai Liu, Qingsong Xu, Youfu Li","doi":"10.1109/CRC55853.2022.10041200","DOIUrl":null,"url":null,"abstract":"Event cameras asynchronously capture pixel-level intensity changes in scenes and output a stream of events. Compared with traditional frame-based cameras, they can offer competitive imaging characteristics: low latency, high dynamic range, and low power consumption. It means that event cameras are ideal for vision tasks in dynamic scenarios, such as human action recognition. The best-performing event-based algorithms convert events into frame-based representations and feed them into existing learning models. However, generating informative frames for long-duration event streams is still a challenge since event cameras work asynchronously without a fixed frame rate. In this work, we propose a novel frame-based representation named Compact Event Image (CEI) for action recognition. This representation is generated by a self-attention based module named Event Tubelet Compressor (EVTC) in a learnable way. The EVTC module adaptively summarizes the long-term dynamics and temporal patterns of events into a CEI frame set. We can combine EVTC with conventional video backbones for end-to-end event-based action recognition. We evaluate our approach on three benchmark datasets, and experimental results show it outperforms state-of-the-art methods by a large margin.","PeriodicalId":275933,"journal":{"name":"2022 7th International Conference on Control, Robotics and Cybernetics (CRC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 7th International Conference on Control, Robotics and Cybernetics (CRC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CRC55853.2022.10041200","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Event cameras asynchronously capture pixel-level intensity changes in scenes and output a stream of events. Compared with traditional frame-based cameras, they can offer competitive imaging characteristics: low latency, high dynamic range, and low power consumption. It means that event cameras are ideal for vision tasks in dynamic scenarios, such as human action recognition. The best-performing event-based algorithms convert events into frame-based representations and feed them into existing learning models. However, generating informative frames for long-duration event streams is still a challenge since event cameras work asynchronously without a fixed frame rate. In this work, we propose a novel frame-based representation named Compact Event Image (CEI) for action recognition. This representation is generated by a self-attention based module named Event Tubelet Compressor (EVTC) in a learnable way. The EVTC module adaptively summarizes the long-term dynamics and temporal patterns of events into a CEI frame set. We can combine EVTC with conventional video backbones for end-to-end event-based action recognition. We evaluate our approach on three benchmark datasets, and experimental results show it outperforms state-of-the-art methods by a large margin.

查看原文本刊更多论文

事件管状压缩器:为基于事件的动作识别生成紧凑的表示

事件相机异步捕捉场景中的像素级强度变化并输出事件流。与传统的基于帧的相机相比，它们可以提供具有竞争力的成像特性:低延迟、高动态范围和低功耗。这意味着事件相机是动态场景中视觉任务的理想选择，比如人类行为识别。性能最好的基于事件的算法将事件转换为基于框架的表示，并将其提供给现有的学习模型。然而，为长时间的事件流生成信息帧仍然是一个挑战，因为事件相机在没有固定帧率的情况下异步工作。在这项工作中，我们提出了一种新的基于帧的表示，称为紧凑事件图像(CEI)，用于动作识别。这种表示由一个名为事件管流压缩器(EVTC)的基于自关注的模块以可学习的方式生成。EVTC模块自适应地将事件的长期动态和时间模式总结为CEI框架集。我们可以将EVTC与传统视频骨干网相结合，实现端到端基于事件的动作识别。我们在三个基准数据集上评估了我们的方法，实验结果表明它在很大程度上优于最先进的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 7th International Conference on Control, Robotics and Cybernetics (CRC)

自引率

0.00%

发文量