Spatial codebooks for image categorization

Proceedings of the 1st ACM International Conference on Multimedia Retrieval Pub Date : 2011-04-18 DOI:10.1145/1991996.1992046

Eugene Mbanya, S. Gerke, P. Ndjiki-Nya

{"title":"Spatial codebooks for image categorization","authors":"Eugene Mbanya, S. Gerke, P. Ndjiki-Nya","doi":"10.1145/1991996.1992046","DOIUrl":null,"url":null,"abstract":"Currently, bag-of-words approaches for image categorization are very popular due to their relative simplicity, robustness and high efficiency. However, they lack the ability to represent the spatial composition of an image. This drawback has been addressed by several approaches, with spatial pyramids being the most popular. Spatial pyramids divide an image into smaller blocks, resulting in a feature vector for each block of the image. The feature vectors for these blocks are concatenated to form the feature vector of the whole image. This leads to an increase in dimension of the whole image's feature vector by a factor corresponding to the number of blocks the image is divided into. Consequently, this causes an increase in computation time proportional to the number of blocks. We propose an extension of the image feature vector by spatial features, which results in a descriptor of similar size as in the standard bag-of-words approach. The classification performance however is similar to those of spatial pyramids which use a feature vector of significantly larger size and therefore are more computationally expensive.","PeriodicalId":390933,"journal":{"name":"Proceedings of the 1st ACM International Conference on Multimedia Retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 1st ACM International Conference on Multimedia Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1991996.1992046","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

Abstract

Currently, bag-of-words approaches for image categorization are very popular due to their relative simplicity, robustness and high efficiency. However, they lack the ability to represent the spatial composition of an image. This drawback has been addressed by several approaches, with spatial pyramids being the most popular. Spatial pyramids divide an image into smaller blocks, resulting in a feature vector for each block of the image. The feature vectors for these blocks are concatenated to form the feature vector of the whole image. This leads to an increase in dimension of the whole image's feature vector by a factor corresponding to the number of blocks the image is divided into. Consequently, this causes an increase in computation time proportional to the number of blocks. We propose an extension of the image feature vector by spatial features, which results in a descriptor of similar size as in the standard bag-of-words approach. The classification performance however is similar to those of spatial pyramids which use a feature vector of significantly larger size and therefore are more computationally expensive.

查看原文本刊更多论文

用于图像分类的空间码本

目前，词袋分类方法因其相对简单、鲁棒性和高效性而受到广泛应用。然而，它们缺乏表示图像空间构成的能力。有几种方法可以解决这个缺点，其中空间金字塔是最受欢迎的。空间金字塔将图像分成更小的块，从而为图像的每个块生成一个特征向量。将这些块的特征向量连接起来，形成整个图像的特征向量。这将导致整个图像特征向量的维度增加一个与图像被划分的块数量相对应的因子。因此，这会导致计算时间的增加与块的数量成正比。我们提出通过空间特征对图像特征向量进行扩展，从而得到与标准词袋方法相似大小的描述符。然而，其分类性能与空间金字塔相似，空间金字塔使用的特征向量明显更大，因此计算成本更高。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 1st ACM International Conference on Multimedia Retrieval

自引率

0.00%

发文量