Shot classification for human behavioural analysis in video surveillance applications

Q4 Computer Science
Newlin Shebiah R, Arivazhagan S
{"title":"Shot classification for human behavioural analysis in video surveillance applications","authors":"Newlin Shebiah R, Arivazhagan S","doi":"10.5565/rev/elcvia.1713","DOIUrl":null,"url":null,"abstract":"Human behavior analysis plays a vital role in ensuring security and safety of people in crowded public places against diverse contexts like theft detection, violence prevention, explosion anticipation etc. Analysing human behaviour by classifying of videos in to different shot types helps in extracting appropriate behavioural cues. Shots indicates the subject size within the frame and the basic camera shots include: the close-up, medium shot, and the long shot. If the video is categorised as Close-up shot type, investigating emotional displays helps in identifying criminal suspects by analysing the signs of aggressiveness and nervousness to prevent illegal acts. Mid shot can be used for analysing nonverbal communication like clothing, facial expressions, gestures and personal space. For long shot type, behavioural analysis is by extracting the cues from gait and atomic action displayed by the person. Here, the framework for shot scale analysis for video surveillance applications is by using Face pixel percentage and deep learning based method. Face Pixel ratio corresponds to the percentage of region occupied by the face region in a frame. The Face pixel Ratio is thresholded with predefined threshold values and grouped into Close-up shot, mid shot and long shot categories. Shot scale analysis based on transfer learning utilizes effective pre-trained models that includes AlexNet, VGG Net, GoogLeNet and ResNet. From experimentation, it is observed that, among the pre-trained models used for experimentation GoogLeNet tops with the accuracy of 94.61%.","PeriodicalId":38711,"journal":{"name":"Electronic Letters on Computer Vision and Image Analysis","volume":"209 ","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Electronic Letters on Computer Vision and Image Analysis","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5565/rev/elcvia.1713","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Computer Science","Score":null,"Total":0}
引用次数: 0

Abstract

Human behavior analysis plays a vital role in ensuring security and safety of people in crowded public places against diverse contexts like theft detection, violence prevention, explosion anticipation etc. Analysing human behaviour by classifying of videos in to different shot types helps in extracting appropriate behavioural cues. Shots indicates the subject size within the frame and the basic camera shots include: the close-up, medium shot, and the long shot. If the video is categorised as Close-up shot type, investigating emotional displays helps in identifying criminal suspects by analysing the signs of aggressiveness and nervousness to prevent illegal acts. Mid shot can be used for analysing nonverbal communication like clothing, facial expressions, gestures and personal space. For long shot type, behavioural analysis is by extracting the cues from gait and atomic action displayed by the person. Here, the framework for shot scale analysis for video surveillance applications is by using Face pixel percentage and deep learning based method. Face Pixel ratio corresponds to the percentage of region occupied by the face region in a frame. The Face pixel Ratio is thresholded with predefined threshold values and grouped into Close-up shot, mid shot and long shot categories. Shot scale analysis based on transfer learning utilizes effective pre-trained models that includes AlexNet, VGG Net, GoogLeNet and ResNet. From experimentation, it is observed that, among the pre-trained models used for experimentation GoogLeNet tops with the accuracy of 94.61%.
视频监控中人类行为分析的镜头分类
人类行为分析对于保障人群在拥挤的公共场所的人身安全起到了至关重要的作用,如防盗、暴力预防、爆炸预测等。通过将视频分类为不同的镜头类型来分析人类行为有助于提取适当的行为线索。镜头表示主体在画面内的大小,基本的相机镜头包括:特写、中景和长镜头。如果录像属于近距离拍摄类型,那么通过分析嫌疑人的攻击性和紧张情绪,可以识别犯罪嫌疑人,从而防止违法行为的发生。中景镜头可以用来分析服装、面部表情、手势和个人空间等非语言交流。对于长镜头类型,行为分析是通过从人的步态和原子动作中提取线索。本文采用基于人脸像素百分比和深度学习的方法对视频监控应用中的镜头尺度进行分析。人脸像素比对应于人脸区域在一帧中所占区域的百分比。人脸像素比使用预定义的阈值进行阈值设置,并分为特写、中景和长镜头三类。基于迁移学习的镜头尺度分析利用了有效的预训练模型,包括AlexNet、VGG Net、GoogLeNet和ResNet。从实验中可以观察到,在用于实验的预训练模型中,GoogLeNet的准确率达到了94.61%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Electronic Letters on Computer Vision and Image Analysis
Electronic Letters on Computer Vision and Image Analysis Computer Science-Computer Vision and Pattern Recognition
CiteScore
2.50
自引率
0.00%
发文量
19
审稿时长
12 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信