Harnessing Object and Scene Semantics for Large-Scale Video Understanding

Zuxuan Wu, Yanwei Fu, Yu-Gang Jiang, L. Sigal
{"title":"Harnessing Object and Scene Semantics for Large-Scale Video Understanding","authors":"Zuxuan Wu, Yanwei Fu, Yu-Gang Jiang, L. Sigal","doi":"10.1109/CVPR.2016.339","DOIUrl":null,"url":null,"abstract":"Large-scale action recognition and video categorization are important problems in computer vision. To address these problems, we propose a novel object-and scene-based semantic fusion network and representation. Our semantic fusion network combines three streams of information using a three-layer neural network: (i) frame-based low-level CNN features, (ii) object features from a state-of-the-art large-scale CNN object-detector trained to recognize 20K classes, and (iii) scene features from a state-of-the-art CNN scene-detector trained to recognize 205 scenes. The trained network achieves improvements in supervised activity and video categorization in two complex large-scale datasets - ActivityNet and FCVID, respectively. Further, by examining and back propagating information through the fusion network, semantic relationships (correlations) between video classes and objects/scenes can be discovered. These video class-object/video class-scene relationships can in turn be used as semantic representation for the video classes themselves. We illustrate effectiveness of this semantic representation through experiments on zero-shot action/video classification and clustering.","PeriodicalId":6515,"journal":{"name":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"7 1","pages":"3112-3121"},"PeriodicalIF":0.0000,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"87","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CVPR.2016.339","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 87

Abstract

Large-scale action recognition and video categorization are important problems in computer vision. To address these problems, we propose a novel object-and scene-based semantic fusion network and representation. Our semantic fusion network combines three streams of information using a three-layer neural network: (i) frame-based low-level CNN features, (ii) object features from a state-of-the-art large-scale CNN object-detector trained to recognize 20K classes, and (iii) scene features from a state-of-the-art CNN scene-detector trained to recognize 205 scenes. The trained network achieves improvements in supervised activity and video categorization in two complex large-scale datasets - ActivityNet and FCVID, respectively. Further, by examining and back propagating information through the fusion network, semantic relationships (correlations) between video classes and objects/scenes can be discovered. These video class-object/video class-scene relationships can in turn be used as semantic representation for the video classes themselves. We illustrate effectiveness of this semantic representation through experiments on zero-shot action/video classification and clustering.
利用对象和场景语义进行大规模视频理解
大规模动作识别和视频分类是计算机视觉中的重要问题。为了解决这些问题,我们提出了一种新的基于对象和场景的语义融合网络和表示。我们的语义融合网络使用三层神经网络结合了三个信息流:(i)基于帧的低级CNN特征,(ii)来自最先进的大规模CNN对象检测器的对象特征,训练以识别20K个类别,以及(iii)来自最先进的CNN场景检测器的场景特征,训练以识别205个场景。训练后的网络分别在ActivityNet和FCVID两个复杂的大规模数据集上实现了监督活动和视频分类的改进。此外,通过融合网络检查和反向传播信息,可以发现视频类和对象/场景之间的语义关系(相关性)。这些视频类-对象/视频类-场景关系反过来可以用作视频类本身的语义表示。我们通过零镜头动作/视频分类和聚类实验说明了这种语义表示的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信