Multi-modal, multi-resource methods for placing Flickr videos on the map

Proceedings of the 1st ACM International Conference on Multimedia Retrieval Pub Date : 2011-04-18 DOI:10.1145/1991996.1992048

P. Kelm, S. Schmiedeke, T. Sikora

引用次数: 28

Abstract

We present three approaches for placing videos in Flickr on the world map. The toponym extraction and geo lookup approach makes use of external resources to identify toponyms in the metadata and associate them with geo-coordinates. The metadata-based region model approach uses a k-nearest-neighbour classifier trained over geographical regions. Videos are represented using their metadata in a text space with reduced dimensionality. The visual region model approach uses a support vector machine also trained over geographical regions. Videos are represented using low-level feature vectors from multiple key frames. Voting methods are used to form a single decision for each video. We compare the approaches experimentally, highlighting the importance of using appropriate metadata features and suitable regions as the basis of the region model. The best performance is achieved by the geo-lookup approach used with fallback to the visual region model when the video metadata contains no toponym.

查看原文本刊更多论文

用于在地图上放置Flickr视频的多模式、多资源方法

我们提出了将Flickr中的视频放置在世界地图上的三种方法。地名提取和地理查找方法利用外部资源识别元数据中的地名，并将其与地理坐标相关联。基于元数据的区域模型方法使用在地理区域上训练的k-近邻分类器。视频在降维的文本空间中使用元数据表示。视觉区域模型方法使用同样经过地理区域训练的支持向量机。视频使用来自多个关键帧的低级特征向量表示。投票的方法是用来形成一个单一的决定，每个视频。我们通过实验比较了这些方法，强调了使用合适的元数据特征和合适的区域作为区域模型基础的重要性。当视频元数据不包含地名时，使用回退到视觉区域模型的地理查找方法可以获得最佳性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 1st ACM International Conference on Multimedia Retrieval

自引率

0.00%

发文量