Video谷歌:视频中对象匹配的文本检索方法

Proceedings Ninth IEEE International Conference on Computer Vision Pub Date : 2003-10-13 DOI:10.1109/ICCV.2003.1238663

Josef Sivic, Andrew Zisserman

{"title":"Video谷歌:视频中对象匹配的文本检索方法","authors":"Josef Sivic, Andrew Zisserman","doi":"10.1109/ICCV.2003.1238663","DOIUrl":null,"url":null,"abstract":"We describe an approach to object and scene retrieval which searches for and localizes all the occurrences of a user outlined object in a video. The object is represented by a set of viewpoint invariant region descriptors so that recognition can proceed successfully despite changes in viewpoint, illumination and partial occlusion. The temporal continuity of the video within a shot is used to track the regions in order to reject unstable regions and reduce the effects of noise in the descriptors. The analogy with text retrieval is in the implementation where matches on descriptors are pre-computed (using vector quantization), and inverted file systems and document rankings are used. The result is that retrieved is immediate, returning a ranked list of key frames/shots in the manner of Google. The method is illustrated for matching in two full length feature films.","PeriodicalId":131580,"journal":{"name":"Proceedings Ninth IEEE International Conference on Computer Vision","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2003-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7002","resultStr":"{\"title\":\"Video Google: a text retrieval approach to object matching in videos\",\"authors\":\"Josef Sivic, Andrew Zisserman\",\"doi\":\"10.1109/ICCV.2003.1238663\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We describe an approach to object and scene retrieval which searches for and localizes all the occurrences of a user outlined object in a video. The object is represented by a set of viewpoint invariant region descriptors so that recognition can proceed successfully despite changes in viewpoint, illumination and partial occlusion. The temporal continuity of the video within a shot is used to track the regions in order to reject unstable regions and reduce the effects of noise in the descriptors. The analogy with text retrieval is in the implementation where matches on descriptors are pre-computed (using vector quantization), and inverted file systems and document rankings are used. The result is that retrieved is immediate, returning a ranked list of key frames/shots in the manner of Google. The method is illustrated for matching in two full length feature films.\",\"PeriodicalId\":131580,\"journal\":{\"name\":\"Proceedings Ninth IEEE International Conference on Computer Vision\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2003-10-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7002\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings Ninth IEEE International Conference on Computer Vision\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCV.2003.1238663\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings Ninth IEEE International Conference on Computer Vision","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCV.2003.1238663","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7002

摘要

我们描述了一种对象和场景检索方法，该方法搜索并定位视频中用户概述对象的所有出现情况。目标由一组视点不变区域描述符表示，以便在视点、光照和部分遮挡变化的情况下仍能成功识别。在一个镜头内视频的时间连续性被用来跟踪区域，以拒绝不稳定的区域和减少噪声在描述符的影响。与文本检索类似的是在实现中预先计算描述符上的匹配(使用矢量量化)，并使用反向文件系统和文档排名。结果是检索是即时的，以谷歌的方式返回关键帧/镜头的排名列表。以两长片的匹配为例说明了该方法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Video Google: a text retrieval approach to object matching in videos

We describe an approach to object and scene retrieval which searches for and localizes all the occurrences of a user outlined object in a video. The object is represented by a set of viewpoint invariant region descriptors so that recognition can proceed successfully despite changes in viewpoint, illumination and partial occlusion. The temporal continuity of the video within a shot is used to track the regions in order to reject unstable regions and reduce the effects of noise in the descriptors. The analogy with text retrieval is in the implementation where matches on descriptors are pre-computed (using vector quantization), and inverted file systems and document rankings are used. The result is that retrieved is immediate, returning a ranked list of key frames/shots in the manner of Google. The method is illustrated for matching in two full length feature films.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings Ninth IEEE International Conference on Computer Vision

自引率

0.00%

发文量