获取跨语言图像标注的深度模型

Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing Pub Date : 2017-06-19 DOI:10.1145/3095713.3095751

Qijie Wei, Xiaoxu Wang, Xirong Li

{"title":"获取跨语言图像标注的深度模型","authors":"Qijie Wei, Xiaoxu Wang, Xirong Li","doi":"10.1145/3095713.3095751","DOIUrl":null,"url":null,"abstract":"This paper considers cross-lingual image annotation, harvesting deep visual models from one language to annotate images with labels from another language. This task cannot be accomplished by machine translation, as labels can be ambiguous and a translated vocabulary leaves us limited freedom to annotate images with appropriate labels. Given non-overlapping vocabularies between two languages, we formulate cross-lingual image annotation as a zero-shot learning problem. For cross-lingual label matching, we adapt zero-shot by replacing the current monolingual semantic embedding space by a bilingual alternative. In order to reduce both label ambiguity and redundancy we propose a simple yet effective approach called label-enhanced zero-shot learning. Using three state-of-the-art deep visual models, i.e., ResNet-152, GoogleNet-Shuffle and OpenImages, experiments on the test set of Flickr8k-CN demonstrate the viability of the proposed approach for cross-lingual image annotation.","PeriodicalId":310224,"journal":{"name":"Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing","volume":"64 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Harvesting Deep Models for Cross-Lingual Image Annotation\",\"authors\":\"Qijie Wei, Xiaoxu Wang, Xirong Li\",\"doi\":\"10.1145/3095713.3095751\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper considers cross-lingual image annotation, harvesting deep visual models from one language to annotate images with labels from another language. This task cannot be accomplished by machine translation, as labels can be ambiguous and a translated vocabulary leaves us limited freedom to annotate images with appropriate labels. Given non-overlapping vocabularies between two languages, we formulate cross-lingual image annotation as a zero-shot learning problem. For cross-lingual label matching, we adapt zero-shot by replacing the current monolingual semantic embedding space by a bilingual alternative. In order to reduce both label ambiguity and redundancy we propose a simple yet effective approach called label-enhanced zero-shot learning. Using three state-of-the-art deep visual models, i.e., ResNet-152, GoogleNet-Shuffle and OpenImages, experiments on the test set of Flickr8k-CN demonstrate the viability of the proposed approach for cross-lingual image annotation.\",\"PeriodicalId\":310224,\"journal\":{\"name\":\"Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing\",\"volume\":\"64 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-06-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3095713.3095751\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3095713.3095751","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

摘要

本文考虑跨语言图像标注，从一种语言中获取深度视觉模型，并用另一种语言的标签对图像进行标注。这个任务不能通过机器翻译完成，因为标签可能是模糊的，而且翻译后的词汇表使我们用适当的标签注释图像的自由受到限制。给定两种语言之间不重叠的词汇表，我们将跨语言图像标注制定为零学习问题。对于跨语言的标签匹配，我们通过用双语替代当前的单语语义嵌入空间来适应零射击。为了减少标签歧义和冗余，我们提出了一种简单而有效的方法，称为标签增强零次学习。使用ResNet-152、GoogleNet-Shuffle和OpenImages三种最先进的深度视觉模型，在Flickr8k-CN测试集上的实验证明了该方法用于跨语言图像标注的可行性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Harvesting Deep Models for Cross-Lingual Image Annotation

This paper considers cross-lingual image annotation, harvesting deep visual models from one language to annotate images with labels from another language. This task cannot be accomplished by machine translation, as labels can be ambiguous and a translated vocabulary leaves us limited freedom to annotate images with appropriate labels. Given non-overlapping vocabularies between two languages, we formulate cross-lingual image annotation as a zero-shot learning problem. For cross-lingual label matching, we adapt zero-shot by replacing the current monolingual semantic embedding space by a bilingual alternative. In order to reduce both label ambiguity and redundancy we propose a simple yet effective approach called label-enhanced zero-shot learning. Using three state-of-the-art deep visual models, i.e., ResNet-152, GoogleNet-Shuffle and OpenImages, experiments on the test set of Flickr8k-CN demonstrate the viability of the proposed approach for cross-lingual image annotation.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing

自引率

0.00%

发文量