Video2Text: Learning to Annotate Video Content

2009 IEEE International Conference on Data Mining Workshops Pub Date : 2009-12-06 DOI:10.1109/ICDMW.2009.79

H. Aradhye, G. Toderici, J. Yagnik

{"title":"Video2Text: Learning to Annotate Video Content","authors":"H. Aradhye, G. Toderici, J. Yagnik","doi":"10.1109/ICDMW.2009.79","DOIUrl":null,"url":null,"abstract":"This paper discusses a new method for automatic discovery and organization of descriptive concepts (labels) within large real-world corpora of user-uploaded multimedia, such as YouTube. com. Conversely, it also provides validation of existing labels, if any. While training, our method does not assume any explicit manual annotation other than the weak labels already available in the form of video title, description, and tags. Prior work related to such auto-annotation assumed that a vocabulary of labels of interest (e. g., indoor, outdoor, city, landscape) is specified a priori. In contrast, the proposed method begins with an empty vocabulary. It analyzes audiovisual features of 25 million YouTube. com videos -- nearly 150 years of video data -- effectively searching for consistent correlation between these features and text metadata. It autonomously extends the label vocabulary as and when it discovers concepts it can reliably identify, eventually leading to a vocabulary with thousands of labels and growing. We believe that this work significantly extends the state of the art in multimedia data mining, discovery, and organization based on the technical merit of the proposed ideas as well as the enormous scale of the mining exercise in a very challenging, unconstrained, noisy domain.","PeriodicalId":351078,"journal":{"name":"2009 IEEE International Conference on Data Mining Workshops","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"68","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 IEEE International Conference on Data Mining Workshops","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDMW.2009.79","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 68

Abstract

This paper discusses a new method for automatic discovery and organization of descriptive concepts (labels) within large real-world corpora of user-uploaded multimedia, such as YouTube. com. Conversely, it also provides validation of existing labels, if any. While training, our method does not assume any explicit manual annotation other than the weak labels already available in the form of video title, description, and tags. Prior work related to such auto-annotation assumed that a vocabulary of labels of interest (e. g., indoor, outdoor, city, landscape) is specified a priori. In contrast, the proposed method begins with an empty vocabulary. It analyzes audiovisual features of 25 million YouTube. com videos -- nearly 150 years of video data -- effectively searching for consistent correlation between these features and text metadata. It autonomously extends the label vocabulary as and when it discovers concepts it can reliably identify, eventually leading to a vocabulary with thousands of labels and growing. We believe that this work significantly extends the state of the art in multimedia data mining, discovery, and organization based on the technical merit of the proposed ideas as well as the enormous scale of the mining exercise in a very challenging, unconstrained, noisy domain.

查看原文本刊更多论文

Video2Text:学习注释视频内容

本文讨论了一种在用户上传的多媒体(如YouTube)的大型真实世界语料库中自动发现和组织描述性概念(标签)的新方法。com。相反，它还提供对现有标签(如果有的话)的验证。在训练时，我们的方法不假设任何显式的手动注释，除了以视频标题、描述和标签的形式已经可用的弱标签。先前与这种自动注释相关的工作假设感兴趣的标签词汇表(例如，室内，室外，城市，景观)是先验指定的。相比之下，建议的方法从一个空词汇表开始。它分析了YouTube上2500万个视频的视听特征。Com视频-近150年的视频数据-有效地搜索这些特征和文本元数据之间的一致相关性。当它发现可以可靠识别的概念时，它会自动扩展标签词汇表，最终形成包含数千个标签并不断增长的词汇表。我们相信，这项工作极大地扩展了多媒体数据挖掘、发现和组织的技术水平，这是基于所提出的思想的技术优点，以及在一个非常具有挑战性、不受约束、嘈杂的领域中进行的大规模挖掘练习。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2009 IEEE International Conference on Data Mining Workshops

自引率

0.00%

发文量