Event Stream based Human Action Recognition: A High-Definition Benchmark Dataset and Algorithms

Xiao Wang, Shiao Wang, Pengpeng Shao, Bo Jiang, Lin Zhu, Yonghong Tian
{"title":"Event Stream based Human Action Recognition: A High-Definition Benchmark Dataset and Algorithms","authors":"Xiao Wang, Shiao Wang, Pengpeng Shao, Bo Jiang, Lin Zhu, Yonghong Tian","doi":"arxiv-2408.09764","DOIUrl":null,"url":null,"abstract":"Human Action Recognition (HAR) stands as a pivotal research domain in both\ncomputer vision and artificial intelligence, with RGB cameras dominating as the\npreferred tool for investigation and innovation in this field. However, in\nreal-world applications, RGB cameras encounter numerous challenges, including\nlight conditions, fast motion, and privacy concerns. Consequently, bio-inspired\nevent cameras have garnered increasing attention due to their advantages of low\nenergy consumption, high dynamic range, etc. Nevertheless, most existing\nevent-based HAR datasets are low resolution ($346 \\times 260$). In this paper,\nwe propose a large-scale, high-definition ($1280 \\times 800$) human action\nrecognition dataset based on the CeleX-V event camera, termed CeleX-HAR. It\nencompasses 150 commonly occurring action categories, comprising a total of\n124,625 video sequences. Various factors such as multi-view, illumination,\naction speed, and occlusion are considered when recording these data. To build\na more comprehensive benchmark dataset, we report over 20 mainstream HAR models\nfor future works to compare. In addition, we also propose a novel Mamba vision\nbackbone network for event stream based HAR, termed EVMamba, which equips the\nspatial plane multi-directional scanning and novel voxel temporal scanning\nmechanism. By encoding and mining the spatio-temporal information of event\nstreams, our EVMamba has achieved favorable results across multiple datasets.\nBoth the dataset and source code will be released on\n\\url{https://github.com/Event-AHU/CeleX-HAR}","PeriodicalId":501347,"journal":{"name":"arXiv - CS - Neural and Evolutionary Computing","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Neural and Evolutionary Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.09764","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Human Action Recognition (HAR) stands as a pivotal research domain in both computer vision and artificial intelligence, with RGB cameras dominating as the preferred tool for investigation and innovation in this field. However, in real-world applications, RGB cameras encounter numerous challenges, including light conditions, fast motion, and privacy concerns. Consequently, bio-inspired event cameras have garnered increasing attention due to their advantages of low energy consumption, high dynamic range, etc. Nevertheless, most existing event-based HAR datasets are low resolution ($346 \times 260$). In this paper, we propose a large-scale, high-definition ($1280 \times 800$) human action recognition dataset based on the CeleX-V event camera, termed CeleX-HAR. It encompasses 150 commonly occurring action categories, comprising a total of 124,625 video sequences. Various factors such as multi-view, illumination, action speed, and occlusion are considered when recording these data. To build a more comprehensive benchmark dataset, we report over 20 mainstream HAR models for future works to compare. In addition, we also propose a novel Mamba vision backbone network for event stream based HAR, termed EVMamba, which equips the spatial plane multi-directional scanning and novel voxel temporal scanning mechanism. By encoding and mining the spatio-temporal information of event streams, our EVMamba has achieved favorable results across multiple datasets. Both the dataset and source code will be released on \url{https://github.com/Event-AHU/CeleX-HAR}
基于事件流的人类动作识别:高清基准数据集与算法
人类动作识别(HAR)是计算机视觉和人工智能领域的一个重要研究领域,RGB 摄像机是该领域研究和创新的首选工具。然而,在现实世界的应用中,RGB 摄像机遇到了许多挑战,包括光线条件、快速运动和隐私问题。因此,生物事件相机因其低能耗、高动态范围等优点而受到越来越多的关注。然而,现有的基于事件的 HAR 数据集大多分辨率较低(346 美元/次 260 美元)。在本文中,我们提出了一个基于 CeleX-V 事件相机的大规模、高清晰度(1280 美元乘以 800 美元)人类动作识别数据集,称为 CeleX-HAR。该数据集涵盖 150 个常见动作类别,共包含 124625 个视频序列。在记录这些数据时,考虑了多视角、光照、动作速度和遮挡等各种因素。为了建立一个更全面的基准数据集,我们报告了 20 多个主流 HAR 模型,供未来的工作进行比较。此外,我们还为基于事件流的 HAR 提出了一种新颖的 Mamba 视觉骨干网络,称为 EVMamba,它配备了空间平面多向扫描和新颖的体素时间扫描机制。通过对事件流的时空信息进行编码和挖掘,我们的EVMamba在多个数据集上取得了良好的效果。数据集和源代码都将在(https://github.com/Event-AHU/CeleX-HAR)上发布。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信