Spatial-Temporal Transformer for Crime Recognition in Surveillance Videos

2022 18th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS) Pub Date : 2022-11-29 DOI:10.1109/AVSS56176.2022.9959414

Kayleigh Boekhoudt, Estefanía Talavera

引用次数: 1

Abstract

Human-related crime recognition from surveillance videos becomes an even more challenging task when dealing with relatively similar human actions. We propose a transformer-based model that relies on the spatial-temporal representation of extracted skeletal trajectories for fine-grained classification. We validate the effectiveness of our model on the complex HR-Crime dataset consisting of videos representing 13 categories of human-related crimes. Quantitative and qualitative results suggest that building a transformer architecture with coupled spatial and temporal modules enables the model to compete in performance while improving intrinsic interpretability.

查看原文本刊更多论文

基于时空变换的监控视频犯罪识别

当处理相对相似的人类行为时，从监控视频中识别与人类有关的犯罪变得更具挑战性。我们提出了一个基于变压器的模型，该模型依赖于提取的骨骼轨迹的时空表示进行细粒度分类。我们在复杂的HR-Crime数据集上验证了我们模型的有效性，该数据集由代表13类人类相关犯罪的视频组成。定量和定性结果表明，建立具有耦合空间和时间模块的变压器体系结构使模型在性能上具有竞争力，同时提高了内在的可解释性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 18th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)

自引率

0.00%

发文量