Towards Hierarchical Temporal Excitation for Video Violence Recognition

IF 5.3 3区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Emerging Topics in Computational Intelligence Pub Date : 2025-01-06 DOI:10.1109/TETCI.2024.3522201

Aihua Mao;Wanqing Wu;Wenwei Yan;Yuxiang Li;Haoxiang Wang

{"title":"Towards Hierarchical Temporal Excitation for Video Violence Recognition","authors":"Aihua Mao;Wanqing Wu;Wenwei Yan;Yuxiang Li;Haoxiang Wang","doi":"10.1109/TETCI.2024.3522201","DOIUrl":null,"url":null,"abstract":"Video-based violence recognition has become a crucial research topic with the wide usage of surveillance cameras. However, recognizing violent behavior from video data is challenging because of the additional temporal dimension, the lack of a precise range of violent behavior, and the complex backgrounds that make recognizing the interaction between objects difficult. Previous works have ambiguous reasoning of temporal features and insufficient understanding of action relationships. To address these issues, we propose a hierarchical temporal excitation network, which is effective for learning deep object interactions in spatio-temporal information and utilizing the interaction to robustly identify violent behaviors even in complex scenarios. The model we proposed comprises of two modules for temporal excitation, namely the shift temporal adaptive module (STAM) and the sparse object interaction transformer module (SOI-Tr). STAM extracts coarse-grained temporal information by fusing the shift component with the temporal adaptive modeling component. Furthermore, considering that deep-layer temporal features are more conducive to network understanding, SOI-Tr is introduced to excite fine-grained temporal representation reasoning by critical object attention. We conduct extensive experiments on mainstream violence datasets and a new constructed multi-class violence (MCV) dataset. The results show that our method outperforms the state-of-the-art works and is superior in understanding the object interaction in violent behavior recognition.","PeriodicalId":13135,"journal":{"name":"IEEE Transactions on Emerging Topics in Computational Intelligence","volume":"9 4","pages":"3025-3038"},"PeriodicalIF":5.3000,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Emerging Topics in Computational Intelligence","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10829685/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Video-based violence recognition has become a crucial research topic with the wide usage of surveillance cameras. However, recognizing violent behavior from video data is challenging because of the additional temporal dimension, the lack of a precise range of violent behavior, and the complex backgrounds that make recognizing the interaction between objects difficult. Previous works have ambiguous reasoning of temporal features and insufficient understanding of action relationships. To address these issues, we propose a hierarchical temporal excitation network, which is effective for learning deep object interactions in spatio-temporal information and utilizing the interaction to robustly identify violent behaviors even in complex scenarios. The model we proposed comprises of two modules for temporal excitation, namely the shift temporal adaptive module (STAM) and the sparse object interaction transformer module (SOI-Tr). STAM extracts coarse-grained temporal information by fusing the shift component with the temporal adaptive modeling component. Furthermore, considering that deep-layer temporal features are more conducive to network understanding, SOI-Tr is introduced to excite fine-grained temporal representation reasoning by critical object attention. We conduct extensive experiments on mainstream violence datasets and a new constructed multi-class violence (MCV) dataset. The results show that our method outperforms the state-of-the-art works and is superior in understanding the object interaction in violent behavior recognition.

查看原文本刊更多论文

基于分层时间激励的视频暴力识别

随着监控摄像机的广泛应用，基于视频的暴力识别已成为一个重要的研究课题。然而，从视频数据中识别暴力行为是具有挑战性的，因为额外的时间维度，缺乏暴力行为的精确范围，以及复杂的背景使得识别物体之间的相互作用变得困难。以往的研究对时间特征推理模糊，对动作关系理解不足。为了解决这些问题，我们提出了一种分层时间激励网络，该网络可以有效地学习时空信息中的深层物体相互作用，并利用相互作用在复杂场景中稳健地识别暴力行为。我们提出的模型包括两个时间激励模块，即移位时间自适应模块（STAM）和稀疏目标交互变压器模块（SOI-Tr）。STAM通过融合位移分量和时间自适应建模分量来提取粗粒度的时间信息。此外，考虑到深层时间特征更有利于网络理解，引入SOI-Tr，通过关键对象注意激发细粒度时间表征推理。我们在主流暴力数据集和新构建的多类暴力（MCV）数据集上进行了广泛的实验。结果表明，该方法在理解暴力行为识别中的对象交互方面优于现有的研究成果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Emerging Topics in Computational Intelligence Mathematics-Control and Optimization

CiteScore

10.30

自引率

7.50%

发文量

147

期刊介绍： The IEEE Transactions on Emerging Topics in Computational Intelligence (TETCI) publishes original articles on emerging aspects of computational intelligence, including theory, applications, and surveys. TETCI is an electronics only publication. TETCI publishes six issues per year. Authors are encouraged to submit manuscripts in any emerging topic in computational intelligence, especially nature-inspired computing topics not covered by other IEEE Computational Intelligence Society journals. A few such illustrative examples are glial cell networks, computational neuroscience, Brain Computer Interface, ambient intelligence, non-fuzzy computing with words, artificial life, cultural learning, artificial endocrine networks, social reasoning, artificial hormone networks, computational intelligence for the IoT and Smart-X technologies.