Image annotation by semantic sparse recoding of visual content

Proceedings of the 20th ACM international conference on Multimedia Pub Date : 2012-10-29 DOI:10.1145/2393347.2393418

Zhiwu Lu, Yuxin Peng

{"title":"Image annotation by semantic sparse recoding of visual content","authors":"Zhiwu Lu, Yuxin Peng","doi":"10.1145/2393347.2393418","DOIUrl":null,"url":null,"abstract":"This paper presents a new semantic sparse recoding method to generate more descriptive and robust representation of visual content for image annotation. Although the visual bag-of-words (BOW) representation has been reported to achieve promising results in image annotation, its visual codebook is completely learnt from low-level visual features using quantization techniques and thus the so-called semantic gap remains unbridgeable. To handle such challenging issue, we utilize both the annotations of training images and the predicted annotations of test images to improve the original visual BOW representation. This is further formulated as a sparse coding problem so that the noise issue induced by the inaccurate quantization of visual features can also be handled to some extent. By developing an efficient sparse coding algorithm, we successfully generate a new visual BOW representation for image annotation. Since such sparse coding has actually incorporated the high-level semantic information into the original visual codebook, we thus consider it as semantic sparse recoding of the visual content. Although the predicted annotations of test images are also used as inputs by the traditional image annotation refinement, we focus on the visual BOW representation refinement for image annotation in this paper. The experimental results on two benchmark datasets show the superior performance of our semantic sparse recoding method in image annotation.","PeriodicalId":212654,"journal":{"name":"Proceedings of the 20th ACM international conference on Multimedia","volume":"67 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 20th ACM international conference on Multimedia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2393347.2393418","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 20

Abstract

This paper presents a new semantic sparse recoding method to generate more descriptive and robust representation of visual content for image annotation. Although the visual bag-of-words (BOW) representation has been reported to achieve promising results in image annotation, its visual codebook is completely learnt from low-level visual features using quantization techniques and thus the so-called semantic gap remains unbridgeable. To handle such challenging issue, we utilize both the annotations of training images and the predicted annotations of test images to improve the original visual BOW representation. This is further formulated as a sparse coding problem so that the noise issue induced by the inaccurate quantization of visual features can also be handled to some extent. By developing an efficient sparse coding algorithm, we successfully generate a new visual BOW representation for image annotation. Since such sparse coding has actually incorporated the high-level semantic information into the original visual codebook, we thus consider it as semantic sparse recoding of the visual content. Although the predicted annotations of test images are also used as inputs by the traditional image annotation refinement, we focus on the visual BOW representation refinement for image annotation in this paper. The experimental results on two benchmark datasets show the superior performance of our semantic sparse recoding method in image annotation.

查看原文本刊更多论文

基于视觉内容语义稀疏重编码的图像标注

本文提出了一种新的语义稀疏编码方法，为图像标注生成更具描述性和鲁棒性的视觉内容表示。虽然视觉词袋(BOW)表示在图像标注中取得了令人满意的效果，但其视觉码本完全是通过量化技术从低级视觉特征中学习而来的，因此所谓的语义差距仍然是不可逾越的。为了解决这一具有挑战性的问题，我们同时利用训练图像的标注和测试图像的预测标注来改进原始的视觉BOW表示。进一步将其表述为稀疏编码问题，从而在一定程度上也可以处理视觉特征量化不准确引起的噪声问题。通过开发一种高效的稀疏编码算法，我们成功地为图像标注生成了一种新的可视化BOW表示。由于这种稀疏编码实际上是将高级语义信息合并到原始视觉码本中，因此我们认为它是对视觉内容的语义稀疏编码。虽然传统的图像标注细化也将测试图像的预测标注作为输入，但本文的重点是图像标注的视觉BOW表示细化。在两个基准数据集上的实验结果表明，我们的语义稀疏编码方法在图像标注中具有优异的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 20th ACM international conference on Multimedia

自引率

0.00%

发文量