An active learning driven deep spatio-textural acoustic feature ensemble assisted learning environment for violence detection in surveillance videos

IF 5.1 2区 工程技术 Q1 ENGINEERING, MULTIDISCIPLINARY
Duba Sriveni , Dr.Loganathan R
{"title":"An active learning driven deep spatio-textural acoustic feature ensemble assisted learning environment for violence detection in surveillance videos","authors":"Duba Sriveni ,&nbsp;Dr.Loganathan R","doi":"10.1016/j.jestch.2025.102050","DOIUrl":null,"url":null,"abstract":"<div><div>In this paper, a novel and robust deep spatio-textural acoustic feature ensemble-assisted learning environment is proposed for violence detection in surveillance videos (DestaVNet). As the name indicates, the proposed DestaVNet model exploits visual and acoustic features to perform violence detection. Additionally, to ensure the scalability of the solution, it employs an active learning concept that retains optimally sufficient frames for further computation and thus reduces computational costs decisively. More specifically, the DestaVNet model initially splits input surveillance footage into acoustic and video frames, followed by multi-constraints active learning based on the most representative frame selection. It applied the least confidence (LC), entropy margin (EM), and margin sampling (MS) criteria to retain the optimal frames for further feature extraction. The DestaVNet model executes pre-processing and feature extraction separately over the frames and corresponding acoustic signals. It performs intensity equalization, histogram equalization, resizing and z-score normalization as pre-processing task, which is followed by deep spatio-textural feature extraction by using gray level co-occurrence matrix (GLCM), ResNet101 and SqueezeNet deep networks. On the other hand, the different acoustic features, including mel-frequency cepstral coefficient (MFCC), gammatone cepstral coefficient (GTCC), <span><math><mrow><mi>GTCC</mi><mo>-</mo><mi>Δ</mi></mrow></math></span>, harmonic to noise ratio (HNR), spectral features and pitch were obtained. These acoustic and spatio-textural features were fused to yield a composite audio-visual feature set, which was later processed for principal component analysis (PCA) to minimize redundancy, and k-NN as part of an ensemble classifier to enhance prediction accuracy, achieving superior performance. The z-score normalization was performed to alleviate the over-fitting problem. Finally, the retained feature sets were processed for two-class classification by using a heterogeneous ensemble learning model, embodying SVM, DT, k-NN, NB, and RF classifiers. Simulation results confirmed that the proposed DestaVNet model outperforms other existing violence prediction methods, where its superiority was affirmed in terms of the (99.92%), precision (99.67%), recall (99.29%) and F-Measure (0.992).</div></div>","PeriodicalId":48609,"journal":{"name":"Engineering Science and Technology-An International Journal-Jestech","volume":"66 ","pages":"Article 102050"},"PeriodicalIF":5.1000,"publicationDate":"2025-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Engineering Science and Technology-An International Journal-Jestech","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2215098625001053","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

In this paper, a novel and robust deep spatio-textural acoustic feature ensemble-assisted learning environment is proposed for violence detection in surveillance videos (DestaVNet). As the name indicates, the proposed DestaVNet model exploits visual and acoustic features to perform violence detection. Additionally, to ensure the scalability of the solution, it employs an active learning concept that retains optimally sufficient frames for further computation and thus reduces computational costs decisively. More specifically, the DestaVNet model initially splits input surveillance footage into acoustic and video frames, followed by multi-constraints active learning based on the most representative frame selection. It applied the least confidence (LC), entropy margin (EM), and margin sampling (MS) criteria to retain the optimal frames for further feature extraction. The DestaVNet model executes pre-processing and feature extraction separately over the frames and corresponding acoustic signals. It performs intensity equalization, histogram equalization, resizing and z-score normalization as pre-processing task, which is followed by deep spatio-textural feature extraction by using gray level co-occurrence matrix (GLCM), ResNet101 and SqueezeNet deep networks. On the other hand, the different acoustic features, including mel-frequency cepstral coefficient (MFCC), gammatone cepstral coefficient (GTCC), GTCC-Δ, harmonic to noise ratio (HNR), spectral features and pitch were obtained. These acoustic and spatio-textural features were fused to yield a composite audio-visual feature set, which was later processed for principal component analysis (PCA) to minimize redundancy, and k-NN as part of an ensemble classifier to enhance prediction accuracy, achieving superior performance. The z-score normalization was performed to alleviate the over-fitting problem. Finally, the retained feature sets were processed for two-class classification by using a heterogeneous ensemble learning model, embodying SVM, DT, k-NN, NB, and RF classifiers. Simulation results confirmed that the proposed DestaVNet model outperforms other existing violence prediction methods, where its superiority was affirmed in terms of the (99.92%), precision (99.67%), recall (99.29%) and F-Measure (0.992).
本文针对监控视频中的暴力检测提出了一种新颖、稳健的深度空间-文本-声学特征集合辅助学习环境(DestaVNet)。顾名思义,本文提出的 DestaVNet 模型利用视觉和声学特征来进行暴力检测。此外,为了确保解决方案的可扩展性,它还采用了主动学习概念,以保留最佳的足够帧数进行进一步计算,从而决定性地降低了计算成本。更具体地说,DestaVNet 模型首先将输入的监控录像分割成声音和视频帧,然后根据最具代表性的帧选择进行多约束主动学习。它采用最小置信度 (LC)、熵边际 (EM) 和边际采样 (MS) 标准来保留最佳帧,以便进一步提取特征。DestaVNet 模型对帧和相应的声音信号分别执行预处理和特征提取。预处理任务包括强度均衡化、直方图均衡化、大小调整和 z 值归一化,然后使用灰度共生矩阵 (GLCM)、ResNet101 和 SqueezeNet 深度网络进行深度空间-文本特征提取。另一方面,还获得了不同的声学特征,包括旋律-频率共振频率系数(MFCC)、伽马通共振频率系数(GTCC)、GTCC-Δ、谐噪比(HNR)、频谱特征和音高。这些声学和空间-文本特征经融合后产生了一个复合视听特征集,随后对其进行了主成分分析(PCA)以减少冗余,并将 k-NN 作为集合分类器的一部分以提高预测准确性,从而实现了卓越的性能。对 z 分数进行归一化处理,以缓解过拟合问题。最后,使用异构集合学习模型对保留的特征集进行两类分类处理,该模型包含 SVM、DT、k-NN、NB 和 RF 分类器。仿真结果证实,所提出的 DestaVNet 模型优于其他现有的暴力预测方法,在预测结果(99.92%)、精确度(99.67%)、召回率(99.29%)和 F-Measure(0.992)方面均具有优势。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Engineering Science and Technology-An International Journal-Jestech
Engineering Science and Technology-An International Journal-Jestech Materials Science-Electronic, Optical and Magnetic Materials
CiteScore
11.20
自引率
3.50%
发文量
153
审稿时长
22 days
期刊介绍: Engineering Science and Technology, an International Journal (JESTECH) (formerly Technology), a peer-reviewed quarterly engineering journal, publishes both theoretical and experimental high quality papers of permanent interest, not previously published in journals, in the field of engineering and applied science which aims to promote the theory and practice of technology and engineering. In addition to peer-reviewed original research papers, the Editorial Board welcomes original research reports, state-of-the-art reviews and communications in the broadly defined field of engineering science and technology. The scope of JESTECH includes a wide spectrum of subjects including: -Electrical/Electronics and Computer Engineering (Biomedical Engineering and Instrumentation; Coding, Cryptography, and Information Protection; Communications, Networks, Mobile Computing and Distributed Systems; Compilers and Operating Systems; Computer Architecture, Parallel Processing, and Dependability; Computer Vision and Robotics; Control Theory; Electromagnetic Waves, Microwave Techniques and Antennas; Embedded Systems; Integrated Circuits, VLSI Design, Testing, and CAD; Microelectromechanical Systems; Microelectronics, and Electronic Devices and Circuits; Power, Energy and Energy Conversion Systems; Signal, Image, and Speech Processing) -Mechanical and Civil Engineering (Automotive Technologies; Biomechanics; Construction Materials; Design and Manufacturing; Dynamics and Control; Energy Generation, Utilization, Conversion, and Storage; Fluid Mechanics and Hydraulics; Heat and Mass Transfer; Micro-Nano Sciences; Renewable and Sustainable Energy Technologies; Robotics and Mechatronics; Solid Mechanics and Structure; Thermal Sciences) -Metallurgical and Materials Engineering (Advanced Materials Science; Biomaterials; Ceramic and Inorgnanic Materials; Electronic-Magnetic Materials; Energy and Environment; Materials Characterizastion; Metallurgy; Polymers and Nanocomposites)
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信