视频理解的多模态融合

A. Hoogs, J. Mundy, G. Cross
{"title":"视频理解的多模态融合","authors":"A. Hoogs, J. Mundy, G. Cross","doi":"10.1109/AIPR.2001.991210","DOIUrl":null,"url":null,"abstract":"The exploitation of semantic information in computer vision problems can be difficult because of the large difference in representations and levels of knowledge. Image analysis is formulated in terms of low-level features describing image structure and intensity, while high-level knowledge such as purpose and common sense are encoded in abstract, non-geometric representations. In this work we attempt to bridge this gap through the integration of image analysis algorithms with WordNet, a large semantic network that explicitly links related words in a hierarchical structure. Our problem domain is the understanding of broadcast news, as this provides both linguistic information in the transcript and video information. Visual detection algorithms such as face detection and object tracking are applied to the video to extract basic object information, which is indexed into WordNet. The transcript provides topic information in the form of detected keywords. Together, both types of information are used to constrain a search within WordNet for a description of the video content in terms of the most likely WordNet concepts. This project is in its early stages; the general ideas and concepts are presented here.","PeriodicalId":277181,"journal":{"name":"Proceedings 30th Applied Imagery Pattern Recognition Workshop (AIPR 2001). Analysis and Understanding of Time Varying Imagery","volume":"218 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2001-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":"{\"title\":\"Multi-modal fusion for video understanding\",\"authors\":\"A. Hoogs, J. Mundy, G. Cross\",\"doi\":\"10.1109/AIPR.2001.991210\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The exploitation of semantic information in computer vision problems can be difficult because of the large difference in representations and levels of knowledge. Image analysis is formulated in terms of low-level features describing image structure and intensity, while high-level knowledge such as purpose and common sense are encoded in abstract, non-geometric representations. In this work we attempt to bridge this gap through the integration of image analysis algorithms with WordNet, a large semantic network that explicitly links related words in a hierarchical structure. Our problem domain is the understanding of broadcast news, as this provides both linguistic information in the transcript and video information. Visual detection algorithms such as face detection and object tracking are applied to the video to extract basic object information, which is indexed into WordNet. The transcript provides topic information in the form of detected keywords. Together, both types of information are used to constrain a search within WordNet for a description of the video content in terms of the most likely WordNet concepts. This project is in its early stages; the general ideas and concepts are presented here.\",\"PeriodicalId\":277181,\"journal\":{\"name\":\"Proceedings 30th Applied Imagery Pattern Recognition Workshop (AIPR 2001). Analysis and Understanding of Time Varying Imagery\",\"volume\":\"218 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2001-10-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings 30th Applied Imagery Pattern Recognition Workshop (AIPR 2001). Analysis and Understanding of Time Varying Imagery\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/AIPR.2001.991210\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings 30th Applied Imagery Pattern Recognition Workshop (AIPR 2001). Analysis and Understanding of Time Varying Imagery","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AIPR.2001.991210","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11

摘要

在计算机视觉问题中,由于知识的表示和水平的巨大差异,语义信息的开发可能是困难的。图像分析是根据描述图像结构和强度的低级特征来制定的,而目的和常识等高级知识则用抽象的非几何表示进行编码。在这项工作中,我们试图通过将图像分析算法与WordNet集成来弥合这一差距,WordNet是一个大型语义网络,可以在分层结构中明确地链接相关单词。我们的问题领域是对广播新闻的理解,因为这既提供了文字记录中的语言信息,也提供了视频信息。将人脸检测和目标跟踪等视觉检测算法应用到视频中,提取基本目标信息,并将其编入WordNet。文本以检测到的关键字的形式提供主题信息。这两种类型的信息一起用于在WordNet中约束搜索,以根据最可能的WordNet概念查找视频内容的描述。这个项目还处于早期阶段;这里给出了一般的思想和概念。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Multi-modal fusion for video understanding
The exploitation of semantic information in computer vision problems can be difficult because of the large difference in representations and levels of knowledge. Image analysis is formulated in terms of low-level features describing image structure and intensity, while high-level knowledge such as purpose and common sense are encoded in abstract, non-geometric representations. In this work we attempt to bridge this gap through the integration of image analysis algorithms with WordNet, a large semantic network that explicitly links related words in a hierarchical structure. Our problem domain is the understanding of broadcast news, as this provides both linguistic information in the transcript and video information. Visual detection algorithms such as face detection and object tracking are applied to the video to extract basic object information, which is indexed into WordNet. The transcript provides topic information in the form of detected keywords. Together, both types of information are used to constrain a search within WordNet for a description of the video content in terms of the most likely WordNet concepts. This project is in its early stages; the general ideas and concepts are presented here.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信