Contextual video clip classification

S. Guler, Ashutosh Morde, Ian A. Pushee, Xiang Ma, Jason A. Silverstein, S. McAuliffe
{"title":"Contextual video clip classification","authors":"S. Guler, Ashutosh Morde, Ian A. Pushee, Xiang Ma, Jason A. Silverstein, S. McAuliffe","doi":"10.1109/AIPR.2012.6528196","DOIUrl":null,"url":null,"abstract":"Content based classification of unrestricted video clips from various sources plays an important role in video analysis and search. Thus far automated video understanding research focused on videos from sources such as aerial, broadcast, meeting room etc. For each of these video sources certain assumptions are made which constrain the problem of content analysis. None of these assumptions hold for analyzing the contents of unrestricted videos. We present a top down approach to content based video classification by first understanding the overall scene structure and then detecting the actors, actions and objects along with the context they interact in as well as the global motion information from the scene. A scene in a video clip is used as a semantic unit providing the visual context and the location characteristics such as indoor, outdoor and type of each associated with the scene. The location context is tied with the video shooting style of zooming in and out to create a scene description hierarchy. Actors are considered as detected people and faces, certain poses of people help define the action and activities, while objects relevant to certain types of events provide additional context. Summary features are created for the scene semantic units based on the actors, actions, object detections and the context. These features were successfully used to train an asymmetric Random Forest classifier for video event classification. The top down approach we present here has the inherent advantage of being able to describe the video in addition to providing content based classification. The approach was tested on the Multimedia Event Detection (MED) 2011 dataset with promising results.","PeriodicalId":406942,"journal":{"name":"2012 IEEE Applied Imagery Pattern Recognition Workshop (AIPR)","volume":"69 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE Applied Imagery Pattern Recognition Workshop (AIPR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AIPR.2012.6528196","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Content based classification of unrestricted video clips from various sources plays an important role in video analysis and search. Thus far automated video understanding research focused on videos from sources such as aerial, broadcast, meeting room etc. For each of these video sources certain assumptions are made which constrain the problem of content analysis. None of these assumptions hold for analyzing the contents of unrestricted videos. We present a top down approach to content based video classification by first understanding the overall scene structure and then detecting the actors, actions and objects along with the context they interact in as well as the global motion information from the scene. A scene in a video clip is used as a semantic unit providing the visual context and the location characteristics such as indoor, outdoor and type of each associated with the scene. The location context is tied with the video shooting style of zooming in and out to create a scene description hierarchy. Actors are considered as detected people and faces, certain poses of people help define the action and activities, while objects relevant to certain types of events provide additional context. Summary features are created for the scene semantic units based on the actors, actions, object detections and the context. These features were successfully used to train an asymmetric Random Forest classifier for video event classification. The top down approach we present here has the inherent advantage of being able to describe the video in addition to providing content based classification. The approach was tested on the Multimedia Event Detection (MED) 2011 dataset with promising results.
上下文视频剪辑分类
对各种来源的无限制视频片段进行基于内容的分类在视频分析和搜索中起着重要的作用。到目前为止,自动视频理解的研究主要集中在空中、广播、会议室等来源的视频。对于这些视频源中的每一个都做了一些假设,这些假设限制了内容分析的问题。这些假设都不适用于分析无限制视频的内容。我们提出了一种自上而下的基于内容的视频分类方法,首先了解整个场景结构,然后检测演员、动作和对象以及它们交互的上下文以及来自场景的全局运动信息。视频片段中的场景被用作提供视觉上下文和位置特征的语义单元,例如室内、室外以及与场景相关的每个场景的类型。位置上下文与视频拍摄风格相关联,通过放大和缩小来创建场景描述层次结构。演员被认为是被检测到的人和面孔,人的某些姿势有助于定义动作和活动,而与某些类型的事件相关的对象提供了额外的上下文。摘要特征是基于演员、动作、对象检测和上下文为场景语义单元创建的。利用这些特征成功地训练了一个用于视频事件分类的非对称随机森林分类器。我们在这里介绍的自顶向下方法除了提供基于内容的分类外,还具有能够描述视频的固有优势。该方法在多媒体事件检测(MED) 2011数据集上进行了测试,取得了良好的效果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信