地理信息在弱监督深度学习中用于地标识别

2017 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2017-07-10 DOI:10.1109/ICME.2017.8019376

Yifang Yin, Zhenguang Liu, Roger Zimmermann

{"title":"地理信息在弱监督深度学习中用于地标识别","authors":"Yifang Yin, Zhenguang Liu, Roger Zimmermann","doi":"10.1109/ICME.2017.8019376","DOIUrl":null,"url":null,"abstract":"The successful deep convolutional neural networks for visual object recognition typically rely on a massive number of training images that are well annotated by class labels or object bounding boxes with great human efforts. Here we explore the use of the geographic metadata, which are automatically retrieved from sensors such as GPS and compass, in weakly-supervised learning techniques for landmark recognition. The visibility of a landmark in a frame can be calculated based on the camera's field-of-view and the landmark's geometric information such as location and height. Subsequently, a training dataset is generated as the union of the frames with presence of at least one target landmark. To reduce the impact of the intrinsic noise in the geo-metadata, we present a frame selection method that removes the mistakenly labeled frames with a two-step approach consisting of (1) Gaussian Mixture Model clustering based on camera location followed by (2) outlier removal based on visual consistency. We compare the classification results obtained from the ground truth labels and the noisy labels derived from the raw geo-metadata. Experiments show that training based on the raw geo-metadata achieves a Mean Average Precision (MAP) of 0.797. Moreover, by applying our proposed representative frame selection method, the MAP can be further improved by 6.4%, which indicates the promising use of the geo-metadata in weakly-supervised learning techniques.","PeriodicalId":330977,"journal":{"name":"2017 IEEE International Conference on Multimedia and Expo (ICME)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Geographic information use in weakly-supervised deep learning for landmark recognition\",\"authors\":\"Yifang Yin, Zhenguang Liu, Roger Zimmermann\",\"doi\":\"10.1109/ICME.2017.8019376\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The successful deep convolutional neural networks for visual object recognition typically rely on a massive number of training images that are well annotated by class labels or object bounding boxes with great human efforts. Here we explore the use of the geographic metadata, which are automatically retrieved from sensors such as GPS and compass, in weakly-supervised learning techniques for landmark recognition. The visibility of a landmark in a frame can be calculated based on the camera's field-of-view and the landmark's geometric information such as location and height. Subsequently, a training dataset is generated as the union of the frames with presence of at least one target landmark. To reduce the impact of the intrinsic noise in the geo-metadata, we present a frame selection method that removes the mistakenly labeled frames with a two-step approach consisting of (1) Gaussian Mixture Model clustering based on camera location followed by (2) outlier removal based on visual consistency. We compare the classification results obtained from the ground truth labels and the noisy labels derived from the raw geo-metadata. Experiments show that training based on the raw geo-metadata achieves a Mean Average Precision (MAP) of 0.797. Moreover, by applying our proposed representative frame selection method, the MAP can be further improved by 6.4%, which indicates the promising use of the geo-metadata in weakly-supervised learning techniques.\",\"PeriodicalId\":330977,\"journal\":{\"name\":\"2017 IEEE International Conference on Multimedia and Expo (ICME)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-07-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE International Conference on Multimedia and Expo (ICME)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICME.2017.8019376\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE International Conference on Multimedia and Expo (ICME)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICME.2017.8019376","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

摘要

用于视觉对象识别的成功的深度卷积神经网络通常依赖于大量的训练图像，这些图像通过类标签或对象边界框进行了很好的注释，并且需要大量的人力。本文探讨了地理元数据在弱监督学习技术中用于地标识别的使用，这些元数据是自动从GPS和指南针等传感器中检索的。根据相机的视场和地标的位置、高度等几何信息，可以计算出一帧中地标的可见性。随后，生成一个训练数据集，作为存在至少一个目标地标的帧的并集。为了减少地理元数据中固有噪声的影响，我们提出了一种帧选择方法，该方法采用两步方法去除错误标记的帧，该方法包括:(1)基于摄像机位置的高斯混合模型聚类，然后(2)基于视觉一致性的异常值去除。我们比较了地面真值标签和原始地理元数据的噪声标签的分类结果。实验表明，基于原始地理元数据的训练得到了0.797的Mean Average Precision (MAP)。此外，采用我们提出的代表性框架选择方法，MAP可以进一步提高6.4%，这表明地理元数据在弱监督学习技术中的应用前景广阔。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Geographic information use in weakly-supervised deep learning for landmark recognition

The successful deep convolutional neural networks for visual object recognition typically rely on a massive number of training images that are well annotated by class labels or object bounding boxes with great human efforts. Here we explore the use of the geographic metadata, which are automatically retrieved from sensors such as GPS and compass, in weakly-supervised learning techniques for landmark recognition. The visibility of a landmark in a frame can be calculated based on the camera's field-of-view and the landmark's geometric information such as location and height. Subsequently, a training dataset is generated as the union of the frames with presence of at least one target landmark. To reduce the impact of the intrinsic noise in the geo-metadata, we present a frame selection method that removes the mistakenly labeled frames with a two-step approach consisting of (1) Gaussian Mixture Model clustering based on camera location followed by (2) outlier removal based on visual consistency. We compare the classification results obtained from the ground truth labels and the noisy labels derived from the raw geo-metadata. Experiments show that training based on the raw geo-metadata achieves a Mean Average Precision (MAP) of 0.797. Moreover, by applying our proposed representative frame selection method, the MAP can be further improved by 6.4%, which indicates the promising use of the geo-metadata in weakly-supervised learning techniques.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2017 IEEE International Conference on Multimedia and Expo (ICME)

自引率

0.00%

发文量