Learning from the uncertain: leveraging social communities to generate reliable training data for visual concept detection tasks

Proceedings of the 15th International Conference on Knowledge Technologies and Data-driven Business Pub Date : 2015-10-21 DOI:10.1145/2809563.2809587

C. Hentschel, Harald Sack

{"title":"Learning from the uncertain: leveraging social communities to generate reliable training data for visual concept detection tasks","authors":"C. Hentschel, Harald Sack","doi":"10.1145/2809563.2809587","DOIUrl":null,"url":null,"abstract":"Recent advances for visual concept detection based on deep convolutional neural networks have only been successful because of the availability of huge training datasets provided by benchmarking initiatives such as ImageNet. Assembly of reliably annotated training data still is a largely manual effort and can only be approached efficiently as crowd-working tasks. On the other hand, user generated photos and annotations are available at almost no costs in social photo communities such as Flickr. Leveraging the information available in these communities may help to extend existing datasets as well as to create new ones for completely different classification scenarios. However, user generated annotations of photos are known to be incomplete, subjective and do not necessarily relate to the depicted content. In this paper, we therefore present an approach to reliably identify photos relevant for a given visual concept category. We have downloaded additional metadata for 1 million Flickr images and have trained a language model based on user generated annotations. Relevance estimation is based on accordance of an image's annotation data with our language model and on subsequent visual re-ranking. Experimental results demonstrate the potential of the proposed method -- comparison with a baseline approach based on single tag matching shows significant improvements.","PeriodicalId":20526,"journal":{"name":"Proceedings of the 15th International Conference on Knowledge Technologies and Data-driven Business","volume":"236 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2015-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 15th International Conference on Knowledge Technologies and Data-driven Business","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2809563.2809587","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Recent advances for visual concept detection based on deep convolutional neural networks have only been successful because of the availability of huge training datasets provided by benchmarking initiatives such as ImageNet. Assembly of reliably annotated training data still is a largely manual effort and can only be approached efficiently as crowd-working tasks. On the other hand, user generated photos and annotations are available at almost no costs in social photo communities such as Flickr. Leveraging the information available in these communities may help to extend existing datasets as well as to create new ones for completely different classification scenarios. However, user generated annotations of photos are known to be incomplete, subjective and do not necessarily relate to the depicted content. In this paper, we therefore present an approach to reliably identify photos relevant for a given visual concept category. We have downloaded additional metadata for 1 million Flickr images and have trained a language model based on user generated annotations. Relevance estimation is based on accordance of an image's annotation data with our language model and on subsequent visual re-ranking. Experimental results demonstrate the potential of the proposed method -- comparison with a baseline approach based on single tag matching shows significant improvements.

查看原文本刊更多论文

从不确定中学习:利用社会群体为视觉概念检测任务生成可靠的训练数据

基于深度卷积神经网络的视觉概念检测的最新进展之所以取得成功，是因为ImageNet等基准测试计划提供了大量的训练数据集。可靠标注的训练数据的组装仍然是一个很大程度上的手工工作，只能作为群体工作任务来有效地接近。另一方面，用户生成的照片和注释在Flickr等社交照片社区中几乎是免费的。利用这些社区中可用的信息可能有助于扩展现有的数据集，并为完全不同的分类场景创建新的数据集。然而，众所周知，用户生成的照片注释是不完整的，主观的，并不一定与所描述的内容相关。因此，在本文中，我们提出了一种可靠地识别与给定视觉概念类别相关的照片的方法。我们已经为100万张Flickr图片下载了额外的元数据，并基于用户生成的注释训练了一个语言模型。相关性估计是基于图像标注数据与我们的语言模型的一致性以及随后的视觉重新排序。实验结果证明了该方法的潜力——与基于单标签匹配的基线方法进行比较显示出显着的改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 15th International Conference on Knowledge Technologies and Data-driven Business

自引率

0.00%

发文量