Hierarchical Joint Max-Margin Learning of Mid and Top Level Representations for Visual Recognition

2013 IEEE International Conference on Computer Vision Pub Date : 2013-12-01 DOI:10.1109/ICCV.2013.213

Hans Lobel, R. Vidal, Á. Soto

{"title":"Hierarchical Joint Max-Margin Learning of Mid and Top Level Representations for Visual Recognition","authors":"Hans Lobel, R. Vidal, Á. Soto","doi":"10.1109/ICCV.2013.213","DOIUrl":null,"url":null,"abstract":"Currently, Bag-of-Visual-Words (BoVW) and part-based methods are the most popular approaches for visual recognition. In both cases, a mid-level representation is built on top of low-level image descriptors and top-level classifiers use this mid-level representation to achieve visual recognition. While in current part-based approaches, mid- and top-level representations are usually jointly trained, this is not the usual case for BoVW schemes. A main reason for this is the complex data association problem related to the usual large dictionary size needed by BoVW approaches. As a further observation, typical solutions based on BoVW and part-based representations are usually limited to extensions of binary classification schemes, a strategy that ignores relevant correlations among classes. In this work we propose a novel hierarchical approach to visual recognition based on a BoVW scheme that jointly learns suitable mid- and top-level representations. Furthermore, using a max-margin learning framework, the proposed approach directly handles the multiclass case at both levels of abstraction. We test our proposed method using several popular benchmark datasets. As our main result, we demonstrate that, by coupling learning of mid- and top-level representations, the proposed approach fosters sharing of discriminative visual words among target classes, being able to achieve state-of-the-art recognition performance using far less visual words than previous approaches.","PeriodicalId":6351,"journal":{"name":"2013 IEEE International Conference on Computer Vision","volume":"18 1","pages":"1697-1704"},"PeriodicalIF":0.0000,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE International Conference on Computer Vision","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCV.2013.213","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 17

Abstract

Currently, Bag-of-Visual-Words (BoVW) and part-based methods are the most popular approaches for visual recognition. In both cases, a mid-level representation is built on top of low-level image descriptors and top-level classifiers use this mid-level representation to achieve visual recognition. While in current part-based approaches, mid- and top-level representations are usually jointly trained, this is not the usual case for BoVW schemes. A main reason for this is the complex data association problem related to the usual large dictionary size needed by BoVW approaches. As a further observation, typical solutions based on BoVW and part-based representations are usually limited to extensions of binary classification schemes, a strategy that ignores relevant correlations among classes. In this work we propose a novel hierarchical approach to visual recognition based on a BoVW scheme that jointly learns suitable mid- and top-level representations. Furthermore, using a max-margin learning framework, the proposed approach directly handles the multiclass case at both levels of abstraction. We test our proposed method using several popular benchmark datasets. As our main result, we demonstrate that, by coupling learning of mid- and top-level representations, the proposed approach fosters sharing of discriminative visual words among target classes, being able to achieve state-of-the-art recognition performance using far less visual words than previous approaches.

查看原文本刊更多论文

视觉识别中高层表示的分层联合最大边际学习

目前，视觉词袋(BoVW)和基于部分的方法是最流行的视觉识别方法。在这两种情况下，中级表示都是建立在低级图像描述符之上的，顶级分类器使用中级表示来实现视觉识别。虽然在当前基于部件的方法中，通常联合训练中层和顶层表示，但对于BoVW方案来说，这不是通常的情况。造成这种情况的一个主要原因是BoVW方法通常需要较大的字典大小，这涉及到复杂的数据关联问题。进一步观察，基于BoVW和基于部件的表示的典型解决方案通常仅限于二元分类方案的扩展，这种策略忽略了类之间的相关关系。在这项工作中，我们提出了一种新的分层视觉识别方法，该方法基于BoVW方案，共同学习合适的中层和顶层表示。此外，使用最大边际学习框架，提出的方法直接处理两个抽象级别的多类情况。我们使用几个流行的基准数据集来测试我们提出的方法。作为我们的主要结果，我们证明了，通过对中层和顶层表示的耦合学习，所提出的方法促进了目标类之间判别性视觉词的共享，能够使用比以前的方法少得多的视觉词实现最先进的识别性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2013 IEEE International Conference on Computer Vision

自引率

0.00%

发文量