Towards a Scene-Based Video Annotation Framework

Fekade Getahun Taddesse, Mekuanent Birara
{"title":"Towards a Scene-Based Video Annotation Framework","authors":"Fekade Getahun Taddesse, Mekuanent Birara","doi":"10.1109/SITIS.2015.123","DOIUrl":null,"url":null,"abstract":"The amount of video in the web is huge and searching specific part of a video can be achieved using a content or text based video indexing, grouping, searching and retrieval approaches. To realize content based video searching, in this work we propose scene based video annotation for the identification and labeling of events and objects in a video with a descriptive text. Video annotation requires a knowledge base to define semantic meaning of events and objects in the video. Manual and semi-supervised video annotation approaches fail as both require expertise for the correct identification and labeling of video concepts. Annotation requires a great deal of concept dependency and relatedness processing to give descriptive statement to a scene in the video. This paper introduces a novel scene based video annotation framework to provide scene level semantic description of videos. The framework uses audio component of the scene to support event and object identification and has proper filtering and normalization. The framework provides concept relatedness, concept formulation, shot and scene level video annotations. To validate the capability of the proposed framework, we developed a prototype that shows a scene level video annotation. The framework is evaluated using a standard video processing evaluation dataset for its accuracy in event and object prediction, and the overall accuracy and usability of the system is evaluated using human ratings. The proposed approach exhibits 81% accuracy in object and event prediction and an average user rating of 3.47 out of 4 in overall system evaluation.","PeriodicalId":128616,"journal":{"name":"2015 11th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 11th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SITIS.2015.123","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

The amount of video in the web is huge and searching specific part of a video can be achieved using a content or text based video indexing, grouping, searching and retrieval approaches. To realize content based video searching, in this work we propose scene based video annotation for the identification and labeling of events and objects in a video with a descriptive text. Video annotation requires a knowledge base to define semantic meaning of events and objects in the video. Manual and semi-supervised video annotation approaches fail as both require expertise for the correct identification and labeling of video concepts. Annotation requires a great deal of concept dependency and relatedness processing to give descriptive statement to a scene in the video. This paper introduces a novel scene based video annotation framework to provide scene level semantic description of videos. The framework uses audio component of the scene to support event and object identification and has proper filtering and normalization. The framework provides concept relatedness, concept formulation, shot and scene level video annotations. To validate the capability of the proposed framework, we developed a prototype that shows a scene level video annotation. The framework is evaluated using a standard video processing evaluation dataset for its accuracy in event and object prediction, and the overall accuracy and usability of the system is evaluated using human ratings. The proposed approach exhibits 81% accuracy in object and event prediction and an average user rating of 3.47 out of 4 in overall system evaluation.
基于场景的视频标注框架
网络上的视频数量是巨大的,搜索视频的特定部分可以使用基于内容或文本的视频索引、分组、搜索和检索方法来实现。为了实现基于内容的视频搜索,本工作提出了基于场景的视频注释,用于识别和标记带有描述性文本的视频中的事件和对象。视频注释需要一个知识库来定义视频中事件和对象的语义。手动和半监督视频注释方法都失败了,因为它们都需要专业知识来正确识别和标记视频概念。注释需要大量的概念依赖和关联处理来对视频中的场景进行描述性陈述。本文提出了一种基于场景的视频标注框架,用于对视频进行场景级语义描述。该框架使用场景的音频组件来支持事件和对象识别,并进行适当的过滤和规范化。该框架提供概念关联、概念表述、镜头和场景级视频注释。为了验证所提出的框架的能力,我们开发了一个显示场景级视频注释的原型。使用标准视频处理评估数据集对框架进行评估,以评估其在事件和对象预测方面的准确性,并使用人类评级来评估系统的总体准确性和可用性。该方法在对象和事件预测中显示出81%的准确率,在整体系统评估中平均用户评分为3.47分(满分4分)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信