图片标签与世界知识:从视觉语义源学习标签关系

Proceedings of the 21st ACM international conference on Multimedia Pub Date : 2013-10-21 DOI:10.1145/2502081.2502113

Lexing Xie, Xuming He

{"title":"图片标签与世界知识:从视觉语义源学习标签关系","authors":"Lexing Xie, Xuming He","doi":"10.1145/2502081.2502113","DOIUrl":null,"url":null,"abstract":"This paper studies the use of everyday words to describe images. The common saying has it that 'a picture is worth a thousand words', here we ask which thousand? The proliferation of tagged social multimedia data presents a challenge to understanding collective tag-use at large scale -- one can ask if patterns from photo tags help understand tag-tag relations, and how it can be leveraged to improve visual search and recognition. We propose a new method to jointly analyze three distinct visual knowledge resources: Flickr, ImageNet/WordNet, and ConceptNet. This allows us to quantify the visual relevance of both tags learn their relationships. We propose a novel network estimation algorithm, Inverse Concept Rank, to infer incomplete tag relationships. We then design an algorithm for image annotation that takes into account both image and tag features. We analyze over 5 million photos with over 20,000 visual tags. The statistics from this collection leads to good results for image tagging, relationship estimation, and generalizing to unseen tags. This is a first step in analyzing picture tags and everyday semantic knowledge. Potential other applications include generating natural language descriptions of pictures, as well as validating and supplementing knowledge databases.","PeriodicalId":20448,"journal":{"name":"Proceedings of the 21st ACM international conference on Multimedia","volume":"6 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"29","resultStr":"{\"title\":\"Picture tags and world knowledge: learning tag relations from visual semantic sources\",\"authors\":\"Lexing Xie, Xuming He\",\"doi\":\"10.1145/2502081.2502113\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper studies the use of everyday words to describe images. The common saying has it that 'a picture is worth a thousand words', here we ask which thousand? The proliferation of tagged social multimedia data presents a challenge to understanding collective tag-use at large scale -- one can ask if patterns from photo tags help understand tag-tag relations, and how it can be leveraged to improve visual search and recognition. We propose a new method to jointly analyze three distinct visual knowledge resources: Flickr, ImageNet/WordNet, and ConceptNet. This allows us to quantify the visual relevance of both tags learn their relationships. We propose a novel network estimation algorithm, Inverse Concept Rank, to infer incomplete tag relationships. We then design an algorithm for image annotation that takes into account both image and tag features. We analyze over 5 million photos with over 20,000 visual tags. The statistics from this collection leads to good results for image tagging, relationship estimation, and generalizing to unseen tags. This is a first step in analyzing picture tags and everyday semantic knowledge. Potential other applications include generating natural language descriptions of pictures, as well as validating and supplementing knowledge databases.\",\"PeriodicalId\":20448,\"journal\":{\"name\":\"Proceedings of the 21st ACM international conference on Multimedia\",\"volume\":\"6 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-10-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"29\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 21st ACM international conference on Multimedia\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2502081.2502113\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 21st ACM international conference on Multimedia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2502081.2502113","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 29

摘要

本文研究了日常用语对意象的描述。俗话说“一图胜千言”，我们要问是哪一千个?标记的社交多媒体数据的激增对理解大规模的集体标签使用提出了挑战——人们可以问照片标签的模式是否有助于理解标签-标签关系，以及如何利用它来改进视觉搜索和识别。我们提出了一种新的方法来联合分析三个不同的视觉知识资源:Flickr、ImageNet/WordNet和ConceptNet。这使我们能够量化两个标签的视觉相关性，了解它们的关系。我们提出了一种新的网络估计算法，逆概念秩，以推断不完全标签关系。然后，我们设计了一种同时考虑图像和标签特征的图像标注算法。我们分析了超过500万张照片和超过2万个视觉标签。这个集合的统计数据对于图像标记、关系估计和推广到未见过的标记都有很好的效果。这是分析图片标签和日常语义知识的第一步。潜在的其他应用包括生成图片的自然语言描述，以及验证和补充知识库。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Picture tags and world knowledge: learning tag relations from visual semantic sources

This paper studies the use of everyday words to describe images. The common saying has it that 'a picture is worth a thousand words', here we ask which thousand? The proliferation of tagged social multimedia data presents a challenge to understanding collective tag-use at large scale -- one can ask if patterns from photo tags help understand tag-tag relations, and how it can be leveraged to improve visual search and recognition. We propose a new method to jointly analyze three distinct visual knowledge resources: Flickr, ImageNet/WordNet, and ConceptNet. This allows us to quantify the visual relevance of both tags learn their relationships. We propose a novel network estimation algorithm, Inverse Concept Rank, to infer incomplete tag relationships. We then design an algorithm for image annotation that takes into account both image and tag features. We analyze over 5 million photos with over 20,000 visual tags. The statistics from this collection leads to good results for image tagging, relationship estimation, and generalizing to unseen tags. This is a first step in analyzing picture tags and everyday semantic knowledge. Potential other applications include generating natural language descriptions of pictures, as well as validating and supplementing knowledge databases.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 21st ACM international conference on Multimedia

自引率

0.00%

发文量