Mahalanobis Encodings for Visual Categorization

Q1 Computer Science

IPSJ Transactions on Computer Vision and Applications Pub Date : 2015-01-01 DOI:10.2197/ipsjtcva.7.69

Tomoki Matsuzawa, Raissa Relator, Wataru Takei, S. Omachi, Tsuyoshi Kato

引用次数: 3

Abstract

Nowadays, the design of the representation of images is one of the most crucial factors in the performance of visual categorization. A common pipeline employed in most of recent researches for obtaining an image representa- tion consists of two steps: the encoding step and the pooling step. In this paper, we introduce the Mahalanobis metric to the two popular image patch encoding modules, Histogram Encoding and Fisher Encoding, that are used for Bag- of-Visual-Word method and Fisher Vector method, respectively. Moreover, for the proposed Fisher Vector method, a close-form approximation of Fisher Vector can be derived with the same assumption used in the original Fisher Vector, and the codebook is built without resorting to time-consuming EM (Expectation-Maximization) steps. Experimental evaluation of multi-class classification demonstrates the effectiveness of the proposed encoding methods.

查看原文本刊更多论文

视觉分类的马氏编码

目前，图像表征的设计是影响视觉分类效果的关键因素之一。在最近的研究中，用于获取图像表示的常见管道包括两个步骤:编码步骤和池化步骤。本文将Mahalanobis度量引入到两种流行的图像patch编码模块中，即直方图编码和Fisher编码，这两种编码模块分别用于Bag- of- visual word法和Fisher矢量法。此外，对于所提出的Fisher向量方法，可以使用与原始Fisher向量相同的假设推导出Fisher向量的近似形式，并且无需使用耗时的EM(期望最大化)步骤构建码本。多类分类的实验评价证明了所提编码方法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IPSJ Transactions on Computer Vision and Applications Computer Science-Computer Vision and Pattern Recognition

自引率

0.00%

发文量