Aihua Mao;Wanqing Wu;Wenwei Yan;Yuxiang Li;Haoxiang Wang
{"title":"Towards Hierarchical Temporal Excitation for Video Violence Recognition","authors":"Aihua Mao;Wanqing Wu;Wenwei Yan;Yuxiang Li;Haoxiang Wang","doi":"10.1109/TETCI.2024.3522201","DOIUrl":null,"url":null,"abstract":"Video-based violence recognition has become a crucial research topic with the wide usage of surveillance cameras. However, recognizing violent behavior from video data is challenging because of the additional temporal dimension, the lack of a precise range of violent behavior, and the complex backgrounds that make recognizing the interaction between objects difficult. Previous works have ambiguous reasoning of temporal features and insufficient understanding of action relationships. To address these issues, we propose a hierarchical temporal excitation network, which is effective for learning deep object interactions in spatio-temporal information and utilizing the interaction to robustly identify violent behaviors even in complex scenarios. The model we proposed comprises of two modules for temporal excitation, namely the shift temporal adaptive module (STAM) and the sparse object interaction transformer module (SOI-Tr). STAM extracts coarse-grained temporal information by fusing the shift component with the temporal adaptive modeling component. Furthermore, considering that deep-layer temporal features are more conducive to network understanding, SOI-Tr is introduced to excite fine-grained temporal representation reasoning by critical object attention. We conduct extensive experiments on mainstream violence datasets and a new constructed multi-class violence (MCV) dataset. The results show that our method outperforms the state-of-the-art works and is superior in understanding the object interaction in violent behavior recognition.","PeriodicalId":13135,"journal":{"name":"IEEE Transactions on Emerging Topics in Computational Intelligence","volume":"9 4","pages":"3025-3038"},"PeriodicalIF":5.3000,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Emerging Topics in Computational Intelligence","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10829685/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Video-based violence recognition has become a crucial research topic with the wide usage of surveillance cameras. However, recognizing violent behavior from video data is challenging because of the additional temporal dimension, the lack of a precise range of violent behavior, and the complex backgrounds that make recognizing the interaction between objects difficult. Previous works have ambiguous reasoning of temporal features and insufficient understanding of action relationships. To address these issues, we propose a hierarchical temporal excitation network, which is effective for learning deep object interactions in spatio-temporal information and utilizing the interaction to robustly identify violent behaviors even in complex scenarios. The model we proposed comprises of two modules for temporal excitation, namely the shift temporal adaptive module (STAM) and the sparse object interaction transformer module (SOI-Tr). STAM extracts coarse-grained temporal information by fusing the shift component with the temporal adaptive modeling component. Furthermore, considering that deep-layer temporal features are more conducive to network understanding, SOI-Tr is introduced to excite fine-grained temporal representation reasoning by critical object attention. We conduct extensive experiments on mainstream violence datasets and a new constructed multi-class violence (MCV) dataset. The results show that our method outperforms the state-of-the-art works and is superior in understanding the object interaction in violent behavior recognition.
期刊介绍:
The IEEE Transactions on Emerging Topics in Computational Intelligence (TETCI) publishes original articles on emerging aspects of computational intelligence, including theory, applications, and surveys.
TETCI is an electronics only publication. TETCI publishes six issues per year.
Authors are encouraged to submit manuscripts in any emerging topic in computational intelligence, especially nature-inspired computing topics not covered by other IEEE Computational Intelligence Society journals. A few such illustrative examples are glial cell networks, computational neuroscience, Brain Computer Interface, ambient intelligence, non-fuzzy computing with words, artificial life, cultural learning, artificial endocrine networks, social reasoning, artificial hormone networks, computational intelligence for the IoT and Smart-X technologies.