Hierarchical Joint Max-Margin Learning of Mid and Top Level Representations for Visual Recognition

Hans Lobel, R. Vidal, Á. Soto
{"title":"Hierarchical Joint Max-Margin Learning of Mid and Top Level Representations for Visual Recognition","authors":"Hans Lobel, R. Vidal, Á. Soto","doi":"10.1109/ICCV.2013.213","DOIUrl":null,"url":null,"abstract":"Currently, Bag-of-Visual-Words (BoVW) and part-based methods are the most popular approaches for visual recognition. In both cases, a mid-level representation is built on top of low-level image descriptors and top-level classifiers use this mid-level representation to achieve visual recognition. While in current part-based approaches, mid- and top-level representations are usually jointly trained, this is not the usual case for BoVW schemes. A main reason for this is the complex data association problem related to the usual large dictionary size needed by BoVW approaches. As a further observation, typical solutions based on BoVW and part-based representations are usually limited to extensions of binary classification schemes, a strategy that ignores relevant correlations among classes. In this work we propose a novel hierarchical approach to visual recognition based on a BoVW scheme that jointly learns suitable mid- and top-level representations. Furthermore, using a max-margin learning framework, the proposed approach directly handles the multiclass case at both levels of abstraction. We test our proposed method using several popular benchmark datasets. As our main result, we demonstrate that, by coupling learning of mid- and top-level representations, the proposed approach fosters sharing of discriminative visual words among target classes, being able to achieve state-of-the-art recognition performance using far less visual words than previous approaches.","PeriodicalId":6351,"journal":{"name":"2013 IEEE International Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE International Conference on Computer Vision","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCV.2013.213","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 17

Abstract

Currently, Bag-of-Visual-Words (BoVW) and part-based methods are the most popular approaches for visual recognition. In both cases, a mid-level representation is built on top of low-level image descriptors and top-level classifiers use this mid-level representation to achieve visual recognition. While in current part-based approaches, mid- and top-level representations are usually jointly trained, this is not the usual case for BoVW schemes. A main reason for this is the complex data association problem related to the usual large dictionary size needed by BoVW approaches. As a further observation, typical solutions based on BoVW and part-based representations are usually limited to extensions of binary classification schemes, a strategy that ignores relevant correlations among classes. In this work we propose a novel hierarchical approach to visual recognition based on a BoVW scheme that jointly learns suitable mid- and top-level representations. Furthermore, using a max-margin learning framework, the proposed approach directly handles the multiclass case at both levels of abstraction. We test our proposed method using several popular benchmark datasets. As our main result, we demonstrate that, by coupling learning of mid- and top-level representations, the proposed approach fosters sharing of discriminative visual words among target classes, being able to achieve state-of-the-art recognition performance using far less visual words than previous approaches.
视觉识别中高层表示的分层联合最大边际学习
目前,视觉词袋(BoVW)和基于部分的方法是最流行的视觉识别方法。在这两种情况下,中级表示都是建立在低级图像描述符之上的,顶级分类器使用中级表示来实现视觉识别。虽然在当前基于部件的方法中,通常联合训练中层和顶层表示,但对于BoVW方案来说,这不是通常的情况。造成这种情况的一个主要原因是BoVW方法通常需要较大的字典大小,这涉及到复杂的数据关联问题。进一步观察,基于BoVW和基于部件的表示的典型解决方案通常仅限于二元分类方案的扩展,这种策略忽略了类之间的相关关系。在这项工作中,我们提出了一种新的分层视觉识别方法,该方法基于BoVW方案,共同学习合适的中层和顶层表示。此外,使用最大边际学习框架,提出的方法直接处理两个抽象级别的多类情况。我们使用几个流行的基准数据集来测试我们提出的方法。作为我们的主要结果,我们证明了,通过对中层和顶层表示的耦合学习,所提出的方法促进了目标类之间判别性视觉词的共享,能够使用比以前的方法少得多的视觉词实现最先进的识别性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信